Rapid pitch correction in choir singers Anke Grella兲 Department of Speech, Music and Hearing, Royal Institute of Technology (KTH), SE-10044 Stockholm, Sweden and Institute for Music Physiology and Musicians’ Medicine (IMMM), Hannover University of Music and Drama, Hohenzollernstrasse 47, DE-30161 Hannover, Germany Johan Sundberg and Sten Ternström Department of Speech, Music and Hearing, Royal Institute of Technology (KTH), SE-10044 Stockholm, Sweden Martin Ptok Department of Phoniatry and Paediatric Audiology, Hannover Medical School, Carl-Neuberg-Strasse 1, DE30625 Hannover, Germany Eckart Altenmüller Institute for Music Physiology and Musicians’ Medicine (IMMM), Hannover University of Music and Drama, Hohenzollernstrasse 47, DE-30161 Hannover, Germany 共Received 3 June 2008; revised 6 May 2009; accepted 11 May 2009兲 Highly and moderately skilled choral singers listened to a perfect fifth reference, with the instruction to complement the fifth such that a major triad resulted. The fifth was suddenly and unexpectedly shifted in pitch, and the singers’ task was to shift the fundamental frequency of the sung tone accordingly. The F0 curves during the transitions often showed two phases, an initial quick and large change followed by a slower and smaller change, apparently intended to fine-tune voice F0 to complement the fifth. Anesthetizing the vocal folds of moderately skilled singers tended to delay the reaction. The means of the response times varied in the range 197– 259 ms depending on direction and size of the pitch shifts, as well as on skill and anesthetization. © 2009 Acoustical Society of America. 关DOI: 10.1121/1.3147508兴 PACS number共s兲: 43.75.Rs 关DD兴 Pages: 407–413 I. INTRODUCTION The voice is probably the most important tool for interhuman communication. Concerning the acoustic properties of the voice, several aspects contribute to the information transfer. Among them pitch may be one of the most prominent, being used to code emotional arousal, stress, prosodic and grammatical sentence structure. A deviation from the target pitch contour of an utterance therefore might affect or even destroy the meaning of a spoken sentence. With respect to pitch control, singers are faced with a specific problem. Not only do they have to reach and preserve the required pitch in a melodic phrase with high precision, but also they need to rapidly adapt their intonation to accompanying instruments or to fellow singers in an ensemble. Professional singers can be expected to be highly skilled in this task, and even semiprofessional choir singers should be able to quickly adapt their pitch, since this is a mandatory prerequisite for the joyful experience of harmonious group singing. The mechanisms allowing such precise pitch control have been subject to several investigations. Wyke 共1967兲 stressed the relevance of a combined acoustico-laryngeal reflex system. According to Burnett et al. 共1998兲 auditory information from the spiral ganglion in the inner ear reaches, via the ventral and dorsal cochlear nuclei, a collection of a兲 Author to whom correspondence should be addressed. Present address: Kleiner Schäferkamp 16E, DE-20357 Hamburg, Germany. Electronic mail: [email protected] J. Acoust. Soc. Am. 126 共1兲, July 2009 nuclei constituting the superior olive. From the superior olive, a fiber tract termed the lateral lemniscus projects to the inferior colliculus. The brainstem pathway then projects directly from the inferior colliculus to the periaqueductal gray and then via the nucleus retroambiguus and nucleus ambiguus to the motoneurons of the respiratory system in the spinal cord and to the laryngeal motor system in the nucleus ambiguus. The long cortical auditory-motor pathway follows the ascending auditory path from the inferior colliculus to the medial geniculate nucleus. The ventral region of the medial geniculate nucleus projects to the primary auditory cortex 共area 41兲, the dorsal region of the medial geniculate nucleus to the secondary auditory cortex 共area 42, 22兲. From there, a pitch control system could project from the anterior cingulate gyrus and from the dorsolateral prefrontal motor region via the limbic system to the periaqueductal gray and finally follow the same structures as the reflex-like pathway to reach the vocal folds and the respiratory system 共Duus, 1995; Deetjen et al., 2005; Friedrich et al., 2004; Schönweiler and Ptok, 2004兲. Most evidence for these pathways comes from animal research 共e.g., Jürgens, 2006兲. However, empirical findings in humans are also available. Sapir et al. 共1982兲, for example, asked their subjects to vocalize with constant pitch and intensity, while receiving auditory stimulation with clicks of different intensities. The electromyography 共EMG兲 of the cricothyroid muscle and the vocal output was assessed and averaged using the respective clicks as triggers. The au- 0001-4966/2009/126共1兲/407/7/$25.00 © 2009 Acoustical Society of America 407 thors found short latency responses in fundamental frequency 共F0兲 共50 ms兲 and EMG 共11 ms兲 in response to auditory stimulation, which definitely supports the above outlined brainstem path. Brown et al. 共2008兲 found two sets of activation peaks in their functional magnetic resonance imaging study while subjects were phonating and performing glottal stops. One ventromedial peak was located deep in the central sulcus and one dorsolateral peak in area 6, which is more superficial. These two peaks outline a kind of wedge, extending from a sulcal position ventrally to a gyral position dorsally and anteriorly. They referred to this area as the “larynx/phonation area” and claim that it is the major region for vocal control in the human motor cortex. They further suggested a connection between this area and the nucleus ambiguus, which is involved in activating intrinsic laryngeal muscles. Over the past decades several investigations have studied reactions to suddenly occurring pitch shifts in the auditory feedback. Burnett et al. 共1997, 1998兲 studied the relevance of the feedback of one’s own voice to the reaction time for pitch corrections in trained singers. Subjects were instructed to phonate at a constant pitch. Without warning, the auditory feedback was manipulated such that the pitch that the subjects heard of their own voices suddenly shifted by about a semitone. Analyzing the F0 change curves, Burnett et al. 共1997, 1998兲 identified two components in most reactions, one early after about 160 ms on average and a longer latency response after approximately 300 ms. Kestler et al. 共1999兲 used the same experimental approach and analyzed the influence of the singers’ expertise by comparing opera singers to laymen. When measuring the reaction times, i.e., the time between the change in the auditory feedback pitch and the singer’s pitch change, they found in professional singers two peaks in the distribution, one early at about 113 ms and one late appearing at 261 ms, while non-singers showed one peak only at 135 ms. In both these investigations the results were interpreted as evidence for two pathways involved in pitch control in singing, a rapid mechanism that may operate via the brain stem and a slower mechanism that may operate via the cerebral cortex. Kestler et al. 共1999兲 proposed that the slower mechanism includes analysis, e.g., of the precise pitch change required to stay in tune. Zarate and Zatorre 共2008兲 recently found neurophysiological support for the idea that different regions of the brain are recruited by singers and non-singers for the purpose of pitch control. Thus, when subjects were asked to compensate for a pitch shift in the auditory feedback of their singing, the singers recruited bilateral auditory areas and left putamen, while non-musicians recruited the left supramarginal gyrus and primary motor cortex. The technique of pitch-shifted auditory feedback has also been applied to experiments with speech production. Xu et al. 共2004兲 analyzed the effect of such shifts on the production of Mandarin tone sequences and found that the majority of the compensatory pitch changes occurred at 143 ms. Applying the same pitch-shifted auditory feedback technique, Chen et al. 共2007兲 studied the effects in the production of 408 J. Acoust. Soc. Am., Vol. 126, No. 1, July 2009 English speech. They found a mean latency of 122 ms in syllable production and a somewhat longer latency in vowel production. Singers are likely to rely not only on auditory but also on proprioceptive feedback for reaching this level of pitch control. Wyke 共1974兲 found that there are three different types of laryngeal mechanoreceptors: stretch-sensitive myotactic receptors in the intrinsic muscles of the larynx, mucosal receptors in the subglottic mucosa, and articular receptors in the fibrous capsules of the intercartilaginous joints. At any change in tension, subglottic pressure, or posture, the information reaches the brain stem from Ia-afferents, and then motor impulses go back to the larynx via efferent alphamotoneurons as a reflex. From a longitudinal study of pitch accuracy under different conditions Mürbe et al. 共2004兲 concluded that kinesthetic feedback substantially contributes to singers’ pitch control. Larson et al. 共2008兲 also demonstrated the influence of kinesthetic feedback on vocal pitch control by anesthetizing singers’ vocal folds. They found larger response latencies to pitch-shifted voice auditory feedback in the anesthetic than in the pre-anesthetic condition. Additionally, they developed a mathematical model suggesting that “early in response, kinesthesia alone provides feedback control, but after about 100 ms, auditory feedback also participates.” However, up to now, there is incomplete information available concerning the role of the kinesthetic system in pitch control of the voice. The aim of the present investigation was to elucidate the physiological mechanisms that underlie pitch control; the authors measured the distribution of reaction times observed when subjects were required to adjust their F0 in response to a sudden shift of an external tuning reference. Measurements were made under three conditions. In experiments 1 and 2 the effect of training was studied by comparing the response time in highly and moderately skilled singers, respectively. Experiment 3 aimed at studying the importance of the laryngeal kinesthetic system by analyzing the effect of anesthetizing the vocal folds. II. MATERIAL AND METHODS The subjects in experiment 1 consisted of a group of 13 highly skilled female choir singers, mean age 28.9 years 共range 23–39 years兲. All had extensive musical experience. The mean choral experience was 19.3 years, range 5–30 years, and all had been taking solo singing lessons for 10.2 years on average, range 4–17 years. Eleven female moderately skilled singers constituted the subject group in experiment 2, age range 14–24 years, mean 19.9. On average, they had 10 years of experience of choral singing. Five female moderately skilled choral singers participated as subjects in experiment 3. They were all experienced singers from different choirs in Hannover, aged 23–27 years, mean 24.8. Two of them participated also in experiment 2. At the time of the experiments none of the subjects reported any voice disease, hearing loss, or other medical problems. Grell et al.: Rapid pitch correction in choir singers The procedure was basically the same in all three experiments although they were run in different sound insulated rooms. Experiment 1 was run at the Department of Speech Music Hearing, KTH, Stockholm, experiment 2 at Institute for Music Physiology and Musicians‘ Medicine Hannover, Germany, and experiment 3 at the Department of Phoniatrics and Pediatric Audiology in the Hannover Medical School, Germany. The subjects were presented with reference dyad stimuli representing the sound of two fellow choral singers singing a fifth interval. Their task was to complement this dyad by singing the missing major third such that a complete major triad resulted. The two tones constituting the reference stimuli were prepared in the following way. The tones were sung by a female singer on the vowel /u:/ and recorded digitally one by one on a Yakumo 166 MHz computer. The F0 values were adjusted in COOL EDIT 共1996兲 so that they produced a perfect fifth dyad 共D4 = 293.7 Hz and A4 = 440 Hz兲. The two tones were then mixed and edited to 5 s duration. Using the COOL EDIT program the F0 values of the two tones were shifted up or down by either a quarter tone or a semitone at a point in time that was randomly selected to occur at t = 1.5, 2.0, 2.5, 3.0, or 3.5 s following stimuli onset. The resulting 20 stimuli 共4 shifts⫻ 5 times兲 were recorded in random order and were separated by 10 s silence. The set of 20 stimuli was presented three times to each subject, making a total of 60 stimuli for every participant in each of the three experiments. The initial pitch level of the missing third was always F#4 共370 Hz兲. When the reference pitch shifted, the singers’ task was to adapt to this shift by changing their F0 accordingly. They were asked to perform this correction as fast and accurately as possible and to phonate continuously until they reached the new target F0. The procedures were the same in experiments 1 and 2 while the subjects in the former were more experienced as singers than those in experiment 2. The anesthesia used in experiment 3 was applied by an experienced phoniatrician. Prior to local anesthesia it was ruled out that the subjects suffered from an allergy against lidocain. Then, they were asked to phonate a neutral vowel /e/. During phonation and while the tongue was gently fixed by the examiner, lidocain was sprayed onto the vocal cords using a commercially available spraying device 共Xylocain Pumpspray, AstraZenica, Wedel, Germany兲. Two sprays of 20 mg lidocain 5% were applied. Due to the position of the tip of the spray device not only vocal folds but also the whole aditus ad laryngis was anesthetized. The recording was started when the subjects reported a throat numbness and swallowing difficulties. According to clinical experience these effects are reliable signs of an anesthesia, sufficient for performing phonosurgery. The anesthesia was administered in a room close to the experimental setup, so that the subjects could start the experiment immediately after the anesthesia was applied. The experimental procedure was approved by the local ethical committee. As the subjects’ ability to perceive frequency differences was crucial to the outcome of the main experiment, two adJ. Acoust. Soc. Am., Vol. 126, No. 1, July 2009 ditional tests were run in order to determine their just noticeable difference 共JND兲 for F0. These tests were conducted after the first and after the second set of 20 stimuli. However, as in experiment 3, the anesthetization effect lasted for no longer than about 20 min, all three experimental blocks of singing were performed in one sequence, and the JND tests were carried out afterwards. In the first JND test, the vowel /u:/ sung at an F0 of 293,8 Hz was presented followed by the same vowel sound with an F0 increased by n ⫻ 1.73 cent, 20ⱖ n ⱖ 0. Thus, the smallest stimulus difference was 0 cent, the second smallest was 1.73 cent, the third was 2 ⫻ 1.73 cent, and so on, and the largest was 20⫻ 1.73 cent, corresponding to 299.7 Hz. In the second test, the vowel sounds were replaced with sine tones. The subjects were sitting in front of a computer screen listening to two consecutive tones, each 2 s long and separated by a 1.16 s pause. The tones were presented monophonically over two loudspeakers. The singers were asked in a forced choice condition to decide whether or not they heard the same stimulus twice and to click accordingly on a yes or a no button on the screen. If three consecutive answers were correct, the frequency difference between the following two tones was decreased, and if an answer was wrong, the frequency difference in the following stimulus was increased. After about 10 min the test was stopped, and the subject’s JND was automatically calculated, thus specifying the subject’s ability to discriminate very small pitch differences. In all three experiments the two stimulus loudspeakers and the microphone were located at the corners of an equalsided triangle with sides of 1 m. The distance between the microphone and the subject’s mouth was 5 cm. However, the equipment used in experiment 1 differed from that used in experiments 2 and 3. In experiment 1 the stimuli were played from a Dell Optiplex GX 240 computer and presented monophonically over two loudspeakers 共Fostex Personal Monitor 6301B兲. The subjects’ responses were picked up by an omnidirectional microphone 共TCM 110兲. The subjects’ responses were recorded on one channel of a digital audio tape 共DAT兲 recorder while the computer stimulus was recorded on the other channel. The stimuli and responses were then digitized and transferred to a Laptop 共Targa Visionary N251C2兲 in COOL EDIT by means of a custom-made program 共DINODAT, S. Granqvist兲. Translations between sound file formats were performed by another custom-made program 共AUDIOFIL, S. Granqvist兲. The F0 analysis was performed by PRAAT 共version 3.8.27兲. In experiments 2 and 3 a Yakumo computer 共166 MHz兲 and Typhoon PS 56 loudspeakers were used. The responses were picked up by a Sennheiser Black Line microphone connected to a Viscount professional 共MM 8兲 mixing console and an Aiwa HD-S200 DAT recorder. The responses were digitally transferred from the DAT recorder directly into the COOL EDIT program of the Yakumo computer. For the analysis it was important to select a method suitable for the material collected. A common way to identify the onset of an F0 reaction is to define a variance estimate for the F0 signal prior to the change and then to identify the point in time when the signal exceeds this average variation window. However, several subjects were singing with wide Grell et al.: Rapid pitch correction in choir singers 409 FIG. 1. Examples of frequently observed patterns of pitch shift corrections: 共a兲 slow reaction after 723 ms, 共b兲 quick reaction after 50 ms, 共c兲 double pitch change after 220 and 1290 ms, and 共d兲 overshoot reaction after 220 ms. These examples were taken from the professional and the moderately skilled singers’ groups. vibrato extent that generated a very large average variation of F0. Other subjects sang with no or quite small vibrato, producing a quite narrow average variation window. Applying the above definition of the onset of F0 change would produce a systematic delay of the reaction in subjects who were singing with a wide vibrato. Therefore, another definition of the onset of F0 shift had to be applied. The reaction time, defined as the interval between the F0 shift of the reference and onset of the subject’s response, was measured using the ORIGIN 6 program. This program displayed the subject’s F0 contour together with the target F0 in the same graph 关see Figs. 1共a兲–1共d兲兴. The onset of the response was manually identified in these graphs. The criterion used for this onset was an F0 shift that approached the new target without interruption, disregarding occasional vibrato undulations during the F0 change. The time coordinates of the onset of the F0 shifts were then collected in an Excel file. To check data reliability a different experimenter 共co-author JS兲 measured 75 randomly chosen reaction times. The mean difference amounted to 41 ms 关standard deviation 共SD兲 45 ms兴. The choice of statistical analysis was complicated by the fact that there were different numbers of subjects in the different experiments. Also, some subjects participated both in experiments 2 and 3. Moreover, the data were not normally distributed according to a Kolmogorov–Smirnov test. 410 J. Acoust. Soc. Am., Vol. 126, No. 1, July 2009 This suggested the application of non-parametric tests. A Kruskal–Wallis test showed that the ranks were significantly different. Therefore, the averages of the different conditions were tested pairwise by means of a Mann–Whitney statistics. As the observed values showed considerable scatter, particularly for the highly skilled subjects, it was necessary to exclude outliers. All values lying above three times the interquartile range, counted from the median, were considered as outliers. The minimum latency value for outliers was just above 740 ms. To eliminate outliers in all groups from the computation of means, all values greater than 740 ms were excluded from the calculation of means. This implied that a total of 17 values were excluded, 11 for the highly skilled, 5 for the moderately skilled, and 1 for the anesthetized subjects. The total number of observations thus became 1381. III. RESULTS The JND for frequency, averaged across all subjects in all three experiments, amounted to 10.5 cent 共SD 4.4兲 for the voice stimulus and to 14.2 cent 共SD 5.5兲 for the sine tone. Thus the subjects were able to hear much smaller pitch differences than the quarter tone 共50 cents兲 used for the stimulus shift in the main experiment. In the singing task, subjects performed in accordance with the instructions in most cases, i.e., they changed their Grell et al.: Rapid pitch correction in choir singers FIG. 2. 共Color online兲 Distribution of reaction times for the three subject groups: 共a兲 highly skilled singers, 共b兲 moderately skilled singers, and 共c兲 moderately skilled singers with anesthetized vocal folds. F0 quickly and continuously until the new target was reached. In some cases, however, they interrupted their singing when the reference stimulus changed, and in some cases the onset of F0 change was very slow, thus preventing determination of the onset of the F0 shift. This problem was particularly frequent in the group of highly skilled singers. These cases were eliminated from the subsequent analysis 共253, 82, and 7 in experiments 1, 2, and 3, respectively兲. The curves shown in Figs. 1共a兲–1共d兲 show typical examples of different types of responses, all observed for a quarter-tone shift of the stimulus. Figures 1共a兲 and 1共b兲 illustrate a slow 共723 ms兲 and a quick 共50 ms兲 reaction time. Interestingly, in both these examples of a descending F0 shift, the rising phase of the vibrato cycle was interrupted by the F0 drop. Figure 1共c兲 presents an example of a pitch change containing two parts. The first one, appearing after 220 ms, is large 共+152 cent peak-to-peak, which was three times as large as the stimulus shift兲. The second part is small 共⫺32 cent兲 occurring after 1290 ms. Figure 1共d兲 shows an example of a marked overshoot, amounting to +134 cent. The reactions of the two subjects who participated in both experiments 2 and 3 did not differ notably from the rest of these respective groups. Experiment 3 was run several weeks later than experiment 2. Thus there was no indication of a learning effect. The distribution of reaction times for the different groups is shown in Fig. 2 and in terms of box plots in Fig. 3. J. Acoust. Soc. Am., Vol. 126, No. 1, July 2009 The means and the standard deviations are listed in Table I. As the Kruskal–Wallis test indicated that the ranks were significantly different, the means of the groups’ latencies were compared pairwise. A Mann–Whitney test for two FIG. 3. Box plots of the results for the three indicated subject groups. The boxes represent the interquartile range, the heavy horizontal lines show the medians, and the bars ⫾1 standard deviation. Unfilled circles show values exceeding the standard deviation. Grell et al.: Rapid pitch correction in choir singers 411 TABLE I. Means in milliseconds and standard deviations for all subject groups and all conditions. N is the number of trials. All conditions Highly skilled singers Moderately skilled singers Anesthetized vocal folds Groups pooled Semitones Quarter tones Descending Mean 共ms兲 SD N Mean 共ms兲 SD N Mean 共ms兲 SD N Mean 共ms兲 SD N Mean 共ms兲 SD N 227 206 251 223 120 135 122 128 516 573 292 1381 216 197 243 214 113 135 129 127 265 283 147 695 238 216 259 233 126 136 114 129 251 290 145 686 236 199 246 222 112 127 110 120 246 285 147 678 219 214 257 224 126 143 133 136 270 288 145 703 independent samples showed that the difference between the means for the moderately skilled and the anesthetized groups was significant 共p ⬍ 0.001兲. The same was true both for the difference between the anesthetized and highly skilled groups 共p ⬍ 0.001兲, and the difference between the moderately and highly skilled groups 共p ⬍ 0.001兲. The mean values showed that anesthesia slowed down the reaction time in all conditions. Moreover, high skill tended to be associated with slightly longer latencies than moderate skill in all conditions. The Mann–Whitney test further revealed that the direction of the pitch change did not have a significant effect on the latency 共p ⬎ 0.1兲 comparing the mean latencies of the three subject groups pooled. Looking at the subject groups separately, the moderately trained subjects, with and without anesthesia, showed the same nonsignificant result as the pooled group 共both p ⬎ 0.5兲, while highly skilled singers showed significantly shorter latencies in performing descending intervals 共p ⬍ 0.01兲. With regard to interval size, the quartertone shifts were associated with a significantly slower reaction in all subject groups as a pool 共p ⬍ 0.01兲, as well as the separated groups 共in all groups p ⬍ 0.05兲. IV. DISCUSSION The authors studied how quickly choir singers adapted their intonation to a change in a reference pitch. Burnett et al. 共1997兲 carried out a somewhat related experiment in which the pitch of singers’ auditory feedback was suddenly shifted while the subject was instructed to sustain a tone and to keep the pitch constant. Under these conditions they observed a mean latency of 159 ms 共variation range 104–223 ms兲 between the onset of the pitch shift of the auditory feedback and the onset of the subject’s attempt to correct F0. This value is smaller than the mean values of 227 and 206 ms observed in experiments 1 and 2. It is probable that the responses reported by Burnett et al. 共1997兲 were unconscious while the authors analyzed a deliberate F0 change. In their experiment, the subjects were frequently even unaware of the pitch changes that they were performing. Our subjects were performing a far more complex task that required analysis of both the direction and the magnitude of the pitch shift. The highly skilled group showed greater mean latency values than the moderately skilled group. It is tempting to speculate that this may be related to the neurophysiologic difference between singers and non-singers observed by Zarate and Zatorre 共2008兲. Quoting these authors: “Through years of training and experience, singers have learned that they need to monitor their auditory feedback closely to en412 Ascending J. Acoust. Soc. Am., Vol. 126, No. 1, July 2009 sure that their notes are produced correctly.” Moderately skilled singers may be less perfectionistic than highly skilled singers in their attempts to produce intended pitches. Such perfectionism may require recruiting the motor cortex and hence take more time. Anesthetizing the vocal folds slowed down the reaction time, thus showing that the kinesthetic feedback represents an important part of singers’ pitch control system. Mürbe et al. 共2004兲 found experimental evidence for the same conclusion. The semitone intervals showed a shorter latency than the quarter-tone intervals. This may simply be an effect of training, since the singers are more accustomed to halftone than to quarter-tone intervals. Burnett et al. 共1997兲 and Kestler et al. 共1999兲 suggested the involvement of a double pathway for the control of vocal pitch. Also other investigations have suggested the existence of such a double pathway. Furthermore, as mentioned, Zarate and Zatorre 共2008兲 observed that different subjects used different brain areas for the purpose of pitch control. One possible manifestation of a double pathway in our results would be that some reactions were early and some late, i.e., a double-peaked distribution. The histograms in Fig. 2, however, fail to show any distributions of this type. On the other hand a double pathway could also be manifested in a different way in our data, such that several individual F0 curves showed two such phases in many cases. A typical example of such a curve was shown in Fig. 1共c兲. Constructing models of the human pitch control system seems a promising avenue for further elucidating the pathways involved. For example, Guenther et al. 共2006兲 presented a neural network model of the components which corresponded to regions of the cerebral cortex and the cerebellum, including premotor, motor, auditory, and somatosensory cortical areas. Another attempt was presented by Xu et al. 共2004兲. Several mechanisms are likely to have been involved in the subjects’ behaviors, some conscious and others unconscious. Reflexes would cause quick, though imprecise reactions. The authors may speculate that very quick F0 shifts such as the one illustrated in Fig. 1共b兲 共50 ms兲 are produced by such an imprecise reflex system, while slow reactions such as in Fig. 1共a兲 共723 ms兲 and the second F0 change in Fig. 1共c兲 共1290 ms兲 rely on conscious control. Also Hain et al. 共2000兲 observed an early and a late component in subjects’ F0 changes. Interestingly, subjects showed such double responses more often when they were instructed to produce specific voluntary responses. The first response was often incorrect, whereas the second, later one Grell et al.: Rapid pitch correction in choir singers almost always followed the instruction. This seems another support for the idea of an early subcortical unconscious reaction and a consciously controlled cortical mechanism. Our results seem relevant also to choral singing practice. They showed that most singers have a rather quick reaction to a shift in an external auditory pitch reference. In a choir the fellow singers generally provide this reference and it is important that the entire ensemble synchronize their pitch changes. At least in many amateur choirs, one individual in each choir voice tends to act as a leader and the fellow singers mainly follow this leader when they sing. Indeed, this structure of the choral group was systematized in the Baroque period, when the leaders were called “concertisten” and the followers “riepienisten” as the German music performance expert Wilhelm Ehmann described in his work 共Ehmann, 1961兲. Incidentally, this structure is systematically implemented also in today’s orchestras in terms of the Concert Masters of each instrument group. Xu et al. 共2004兲 found in Mandarin speakers that the mean latency of 164 ms was comparable to the mean syllable duration. This seems comparable to the situation in singing. In coloratura singing typical note durations lie in the vicinity of 125 ms 共Huron, 2001; Lindblom and Sundberg, 2007兲. This is too short to allow a slow correction of pitch. This implies that the strategy of ripienisten simply shadowing concertisten in choral ensembles is not appropriate in fast music. Instead singers need to hit the target closely enough on the first attempt. Singers must know the entire sequence of pitches before they start to sing it. V. CONCLUSION Our investigation of the reaction times in different groups of singers has shown that, typically, highly skilled choir singers reacted to a change in a pitch reference after 227 ms while moderately skilled choir singers’ reactions appeared after 206 ms. Anesthetization of moderately skilled singers’ vocal folds tended to slow down the reaction. The reaction time was typically shorter when the reference was shifted by a semitone rather than by a quarter tone. Indications of a double pathway for F0 controls were not found in the distribution of reaction times, but many singers’ F0 curves during the transitions showed two phases, an initial quick and large change followed by a slower and smaller change, apparently aiming at a fine-tuning of the pitch. ACKNOWLEDGMENTS Mikael Bohman kindly assisted with some of the MATprocessing. Henrik Jansson kindly discussed with us some control-theory aspects of this investigation. Friederike Lipka helped complementing the statistics. Kathrin Lüerßen did the anesthesia of the vocal folds in the Department of Phoniatry and Paediatric Audiology at Hannover Medical School. Dietrich Parlitz who supported the first idea of the LAB J. Acoust. Soc. Am., Vol. 126, No. 1, July 2009 study and the pilot experiment. The work on experiment 1 was done in Stockholm while participating in the Marie Curie Fellowship Program, sponsored by the European Commission. Brown, S., Ngan, E., and Liotti, M. 共2008兲. “A larynx area in the human motor cortex,” Cereb. Cortex 18, 837–845. Burnett, T. A., Freedland, M. B., Larson, C. R., and Hain, T. C. 共1998兲. “Voice F0 responses to manipulations in pitch feedback,” J. Acoust. Soc. Am. 103, 3153–3161. Burnett, T. A., Senner, J. E., and Larson, C. R. 共1997兲. “Voice F0 responses to pitch-shifted auditory feedback: A preliminary study,” J. Voice 11, 202– 211. Chen, S., Liu, H., Xu, Y., and Larson, C. 共2007兲. “Voice F0 responses to pitch-shifted voice feedback during English speech,” J. Acoust. Soc. Am. 121, 1157–1163. Deetjen, P., Speckmann, E. J., and Hescheler, J. 共2005兲. Physiologie (Physiology) 共Elsevier Urban Fischer, Amsterdam兲. Duus, P. 共1995兲. Neurologisch-Topische Diagnostik (Neurologic-Topical Diagnostics) 共Thieme, Stuttgart兲. Ehmann, W. 共1962兲. Concertisten und Ripienisten in der h-moll Messe Johann Sebastian Bachs (Concertists and Ripienists in the Mass in B Minor by Johann Sebastian Bach) 共Bärenreiter, Kassel, Germany兲. Friedrich, G., Biegenzahn, W., and Zarowka, P. 共2004兲. Phoniatrie und Pädaudiologie (Phoniatrics and Paedaudiology) 共Huber, Bern兲. Guenther, F. H., Ghosh, S. S., and Tourville, J. A. 共2006兲. “Neural modeling and imaging of the cortical interactions underlying syllable production,” Brain Lang 96, 280–301. Hain, T. C., Burnett, T. A., Kiran, S., Larson, C. R., Singh, S., and Kenney, M. K., 共2000兲. “Instructing subjects to make a voluntary response reveals the presence of two components to the audio-vocal reflex,” Exp. Brain Res. 130, 133–141. Huron, D. 共2001兲. “Tone and voice: A derivation of the rules of voiceleading from perceptual principles,” Music Percept. 19, 1–64. Hage, S. R., Jürgens, U., and Ehret, G., 共2006兲. “Audio-vocal interaction in the pontine brainstem during self-initiated vocalization in the squirrel monkey,” Eur. J. Neurosci. 23, 3297–3308. Kestler, C., Parlitz, D., and Altenmüller, E. 共1999兲. “Experimentelle studie zur schnellen tonhöhenkorrektur bei sängern und nichtsängern 共An experimental study in rapid pitch correction of professional singers and nonsingers兲,” Diploma thesis, Hannover University for Music and Drama, Hannover, Germany. Larson, C. R., Altman, K. W., Liu, H., and Hain, T. C. 共2008兲. “Interactions between auditory and somatosensory feedback for voice F0 control,” Exp. Brain Res. 187, 613–621. Lindblom, B., and Sundberg, J. 共2007兲. “The human voice in speech and singing,” in Springer Handbook of Acoustics, edited by T. Rossing 共Springer, Heidelberg, Germany兲, Chap. 16, pp. 669–712. Mürbe, D., Pabst, F., Hofmann, G., and Sundberg, J. 共2004兲. “Effects of a professional solo singer education on auditory and kinesthetic feedback—A longitudinal study of singers’ pitch control,” J. Voice 18, 236–241. Sapir, S., McClean, M. D., and Larson, C. R. 共1983兲. “Human laryngeal responses to auditory stimulation,” J. Acoust. Soc. Am. 73, 315–321. Schönweiler, R., and Ptok, M. 共2004兲. Phoniatrie und Pädaudiologie (Phoniatrics and Paedaudiology) 共Hannover Medical School, Hannover, Germany兲. Wyke, B. D. 共1967兲. “Advances in the neurology of phonation: Phonatory reflex mechanisms in the larynx,” British J. Communic. 2, 2–14. Wyke, B. D. 共1974兲. “Laryngeal neuromuscular control systems in singing. A review of current concepts,” Folia Phoniatr Logop 26, 295–306. Xu, Y., Larson, C., Bauer, J., and Hain, T. 共2004兲. “Compensation for pitchshifter auditory feedback during the production of Mandarin tone sequences,” J. Acoust. Soc. Am. 116, 1168–1178. Zarate, J. M., and Zatorre, R. J. 共2008兲. “Experience-dependent neural substrates involved in vocal pitch regulation during singing,” Neuroimage, 40, 1871–1887. Grell et al.: Rapid pitch correction in choir singers 413
© Copyright 2026 Paperzz