Training native English speakers to identify Japanese vowel length contrast with sentences at varied speaking ratesa) Yukari Hirata,b兲 Elizabeth Whitehurst, and Emily Cullings Colgate University, Department of East Asian Languages and Literatures, Hamilton, New York 13346 共Received 18 August 2006; revised 26 March 2007; accepted 26 March 2007兲 Native English speakers were trained to identify Japanese vowel length in three types of training differing in sentential speaking rate: slow-only, fast-only, and slow-fast. Following Pisoni and Lively’s high phonetic variability hypothesis 关Pisoni, D. B., and Lively, S. E., Speech Perception and Linguistic Experience, 433–459 共1995兲兴, higher stimulus variability by means of training with two rates was hypothesized to aid learners in adapting to speech rate variation more effectively than training with only one rate. Trained participants identified the length of the second vowel of disyllables, short or long, embedded in a sentence of the respective rate, and received immediate feedback. The three trained groups’ abilities before and after training were examined with tests containing sentences of slow, normal, and fast rates, and were compared with those of a control that was not trained. A robust effect of slow-fast training, a marginal effect of slow-only training, but no significant effect of fast-only training were found in the overall test scores. Slow-fast and slow-only training showed small advantages over fast-only training on the fast-rate test scores, while effects for all three training types were found on the slow- and normal-rate test scores. The degree to which the results support the high phonetic variability hypothesis is discussed. © 2007 Acoustical Society of America. 关DOI: 10.1121/1.2734401兴 PACS number共s兲: 43.71.Hw, 43.71.Es 关ARB兴 I. INTRODUCTION This study examined the way non-native speakers perceive Japanese vowel length contrast in sentences produced at various speaking rates. The vowel length distinction 共short versus long兲 is phonemic in Japanese, e.g., /kuTo/ “black” versus /kuTob/ “hardships,” and all five short vowels /i e a o u/ contrast with the corresponding long vowels 共/ib eb ab ob ub/兲 共Vance, 1987兲. Long vowels are 2.2–3.2 times longer in duration than short vowels, whereas only small differences have been observed between the formant frequencies of short and long vowels 共Kondo, 1995; Tsukada, 1999; Ueyama, 2000; Hirata, 2004a; Hirata and Tsukada, 2004兲. This vowel length distinction, as well as a consonant length distinction 共e.g., /kata/ “shoulder” versus /katba/ “won”兲 has been found difficult for native English 共NE兲 speakers to acquire 共e.g., Landahl et al., 1992; Han, 1992; Landahl and Ziolkowski, 1995; Yamada et al., 1995; Toda, 1997; Oguma, 2000; Tajima et al., 2002; Hirata, 2004b兲. Native Japanese speakers’ primary perceptual cue to the Japanese vowel length distinction is vowel duration 共Fujisaki et al., 1997兲, but a long vowel spoken quickly can be shorter than a short vowel spoken slowly 共Hirata, 2004a兲. Speaking rate variation is thought to contribute to the difficulty of acquiring the length distinctions. Research over the past 50 years has shown how a native speaker’s perception of a phonetic segment is affected by the a兲 Portions of this work were presented at the International Conference on Spoken Language Processing, Pittsburgh, September 2006, and at the fourth Joint Meeting of the Acoustical Society of America and the Acoustical Society of Japan, Honolulu, Hawaii, December 2006. b兲 Author to whom correspondence should be addressed. Electronic mail: [email protected] J. Acoust. Soc. Am. 121 共6兲, June 2007 Pages: 3837–3845 rates of the segment and of the surrounding context 共Denes, 1955; Port, 1977; Verbrugge and Shankweiler, 1977; Miller and Liberman, 1979; Port and Dalby, 1982; Johnson and Strange, 1982; Gottfried et al., 1990; Newman and Sawusch, 1996; Sawusch and Newman, 2000兲. For example, Pickett et al. 共1999兲 found that native Italian speakers’ perception of Italian single and geminate stops was based on the stop closure duration relative to the preceding vowel. Listeners also use the speaking rate of a sentence to perceive native language 共L1兲 contrasts of which the primary acoustic correlate is duration 共e.g., Miller, 1987, and Wayland et al., 1994, for English /p/-/b/ contrast; Magen and Blumstein, 1993, for Korean short and long vowels兲. These studies in L1 perception show rate normalization: native listeners do not identify a phonetic segment based on the absolute duration of the segment. Rather, their perception takes into account both the duration of neighboring segments and the segments that are contrastive in the language. Consistent with the findings above, native Japanese 共NJ兲 speakers use the rate of a sentence as a perceptual cue for distinguishing Japanese consonant and vowel length 共Hirata, 1990a; Hirata and Lambacher, 2004兲. NJ listeners in Hirata 共1990a兲 identified the pair of words /ita/ “stayed” versus /itba/ “went” based on the durational ratio of the stop closure to the preceding vowel /i/ when the words were presented in isolation. In a second experiment, an edited word 关it共b兲ta兴 in which the durational ratio was identified as ambiguous by NJ listeners was embedded in a carrier sentence produced at different speaking rates. NJ listeners identified this ambiguous word as /ita/ or /itba/ based on the rate of the sentence. When the sentence was spoken slowly, the word was clearly identified as /ita/, and when the sentence was spoken faster, 0001-4966/2007/121共6兲/3837/9/$23.00 © 2007 Acoustical Society of America 3837 the word was clearly identified as /itba/. The perceptual cue present in the sentence rate superseded the ambiguous cue present in the isolated word. Compared to rate normalization by native listeners, less is known as to how second language 共L2兲 learners normalize speech rate and learn to make duration-based distinctions in sentences. In Hirata 共1990b兲, when the Japanese words /ita/ and /itba/ were presented in isolation, NE speakers distinguished the words, similarly to NJ speakers, using the durational ratio of the stop closure to the preceding vowel as a perceptual cue. However, when the words were embedded in a sentence spoken at different rates, NE speakers were unable to use the rate of the sentences as a perceptual cue as NJ listeners did. Thus, NE learners of Japanese have difficulty in normalizing rate over a sentence in Japanese, even though they can use the localized perceptual cue when isolated words are presented. What method would most effectively enable NE speakers to normalize the rate of sentences and identify vowel length in an embedded word accurately? Given the results of Hirata 共1990b兲, training L2 learners on isolated syllable or word contexts might not be the best method. The development of training methods for difficult L2 contrasts has been one of the most fruitful areas of L2 research 共English /[/-/l/: Strange and Dittmann, 1984; Logan et al., 1991; Bradlow et al., 1997; English /ð/-//: Jamieson and Morosan, 1986; Mandarin tones: Wang et al., 1999; Korean stops: Francis and Nusbaum, 2002; Japanese length contrasts: Yamada et al., 1995; Kawai and Hirose, 2000; Tajima et al., 2002兲. However, these studies provided L2 contrasts in isolated syllables or words. Investigation of effects of training using sentence stimuli has only recently begun 共Hirata, 2004b; Hirata, 2004c; Kato et al., 2005兲. Studying the L2 vowel length distinction in a sentence context is particularly important, as research has shown that native listeners use the rate of sentences for making duration-based distinctions 共Hirata, 1990a; Hirata and Lambacher, 2004兲. A recent study 共Kato et al., 2004兲 showed that, after perceptual training with isolated words, NE speakers learned to perceive short and long vowels in sentences, indicating that their ability gained for the isolated word context generalized to the sentence context 共see also Hirata, 2004b兲. However, Tajima et al.’s 共2005兲 results showed that the accuracy of NE speakers’ perception of Japanese length contrast was affected by the speaking rate of test stimuli, and that this effect did not decrease after training with isolated words. Pisoni and Lively’s 共1995兲 high phonetic variability hypothesis provides a useful guide in the search for a method which would enable NE speakers to successfully identify Japanese vowel length. This hypothesis claims that the diversity of acoustic cues present in materials produced by multiple speakers helps rather than hinders non-native listeners in forming new perceptual categories. Traditionally, stimulus variability in voices and speaking rates was viewed as troublesome noise, extraneous to the encoding of linguistic units, and was deliberately excluded from experiments by the use of carefully controlled materials 共Pisoni, 1997兲. However, evidence has accumulated that high stimulus variability in voices and phonetic contexts is informative, rather than 3838 J. Acoust. Soc. Am., Vol. 121, No. 6, June 2007 distracting, and actually assists perceptual learning 共Pisoni and Lively, 1995; Bradlow et al., 1997兲. In the present study, we tested this high variability hypothesis with regard to speaking rate variation. We hypothesized that providing variation in speaking rate helps non-native listeners to normalize rate and to learn to distinguish vowel length accurately. The present study compared the relative effectiveness of three types of perceptual training: one with only a slow rate, one with only a fast rate, and one with both slow and fast rates. Training materials were sentences naturally produced by four NJ speakers at these rates. We chose this method rather than creating various speaking rates with a synthesizer because faster natural speech is not simply compressed slower speech, but involves changes in the size and velocity of various muscle contractions 共Gay and Hirose, 1973; Gay et al., 1974; Gay, 1981兲. According to the hypothesis tested in the present research, the variations that exist in natural speech spoken at slow and fast rates are informative for nonnative listeners. The prediction was that speech materials spoken at two rates 共slow and fast兲, contain more diverse and rich acoustic cues than those spoken only at one rate 共slowonly or fast-only兲, and that this diversity aids non-native listeners in adjusting to different rates of speech and identifying short/long vowels accurately. Effects of the three types of training were examined by a pretest and a post-test consisting of sentences produced by two additional NJ speakers at three speaking rates: slow, normal, and fast. These three rates were included in order to examine whether the ability gained through training with a given rate or rates would generalize to rate共s兲 to which participants had not been exposed. The present study compared each type of training with a control group that did not participate in training. Comparison with a control was important because our pilot results indicated effects of repetition and exposure, i.e., improvement of test scores simply by taking the tests twice without training. Thus, training is said to be “effective” if the amount of improvement from the pretest and the post-test was significantly greater for the trained than the control group. The study addressed the following questions: 共1兲 How do NE speakers perform in identifying Japanese vowel length in carrier sentences spoken at three speaking rates 共slow, normal, and fast兲? Previous studies have shown that rate effects interacted with word type 共Hirata, 2005a兲, vowel length 共i.e., short or long; Hirata, 2005b兲, and context 共i.e., in isolation or in sentences; Tajima et al., 2002 and Tajima et al., 2005兲. In these studies, identification accuracy in the sentence context is generally lower for faster rates. The present study will examine if this result is replicable. 共2兲 Does perceptual training with sentences, as opposed to isolated words or syllables as in most previous studies, aid NE speakers in identifying Japanese vowel length accurately? This study will determine whether the present three types of training with sentences enable effective L2 contrast learning. 共3兲 Does training with sentences spoken at two rates 共slow Hirata et al.: Japanese vowel length in varied rates TABLE I. Training stimuli. Block and speaker Show-only training Session 1 Session 2 Session 3 Session 4 Fast-only training Session 1 Session 2 Session 3 Session 4 Slow-fast training Session 1 Session 2 Session 3 Session 4 共a兲–共h兲 共a兲–共h兲 共a兲–共h兲 共a兲–共h兲 Male 1 Female 1 Male 2 Female 2 共a兲–共h兲 共a兲–共h兲 共a兲–共h兲 共a兲–共h兲 Male 1 Female 1 Male 2 Female 2 共a兲–共h兲 共a兲–共h兲 共a兲–共h兲 共a兲–共h兲 Male 1 Female 1 Male 2 Female 2 Carrier sentence Rate ssssssssa ssssssss ssssssss ssssssss soko wa___to kaite aTimasu 共‘___ is written there.’兲 ffffffff fffffiff ffffffff ffffffff sfsfsfsf sfsfsfsf sfsfsfsf sfsfsfsf a Indicates rate of stimuli that was presented within each of the eight blocks. “s”⫽slow rate. “f”⫽fast rate. 共a兲–共h兲 indicate blocks. and fast兲 improve overall perceptual accuracy more than training with only one rate 共slow-only or fast-only兲? According to Pisoni and Lively’s 共1995兲 hypothesis, the overall improvement should be higher for two-rate than one-rate training. 共4兲 How does the perceptual ability of NE speakers generalize from trained rate共s兲 to tested rates? For example, does slow-only training improve accuracy only for the slow rate, or improve it for other rates as well? Does slow-fast training improve accuracy only for these two rates, but not for the untrained 共normal兲 rate? II. METHODS A. Participants Sixty-two monolingual native speakers of American English, whose ages ranged from 18 to 22, were randomly assigned to one of the following four groups: slow-only training group 共n = 16; mean age⫽19.2兲, fast-only training group 共n = 16; mean age⫽19.2兲, slow-fast training group 共n = 14; mean age⫽19.7兲, and a control group 共n = 16; mean age ⫽19.3兲. None of the participants had previously studied Japanese or had extensive exposure to spoken Japanese. All but one participant had studied one or two foreign languages, usually Spanish and/or French, and one participant had studied five languages. None of the participants, however, had native-level fluency in any foreign language. No participant reported having any hearing problems. Participants were paid for their participation. B. Training procedure Participants assigned to one of the three training groups 共slow-only, fast-only, and slow-fast兲 completed four training sessions in addition to a pretest and a post-test. Participants completed the experiment over a minimum period of 11 days and a maximum of 17 days. Participants had any two consecutive training sessions at least 24 hours apart, and no more than four days apart. 3839 J. Acoust. Soc. Am., Vol. 121, No. 6, June 2007 Training materials were a subset of materials used in Hirata 共2004a兲. The target words were nonsense Japanese words in the form of /mVmV/ and /mVmVV/ 共V⫽/i, e, a, o, u/; e.g., /mimi/ vs /mimib/兲, with the pitch accent always on the first vowel. These words were embedded in a single carrier sentence spoken by four native Japanese speakers 共Male 1, Female 1, Male 2, and Female 2; Table I兲. For slow-rate and fast-rate training materials, the speakers were instructed to speak as slowly and as fast as possible, respectively. The four speakers’ mean speaking rate was 263 ms/ mora for the slow rate and 105 ms/ mora for the fast rate.1 Each type of training consisted of four sessions. Each session contained 160 stimuli by one speaker. As described in Table I, only slow-rate materials were used for slow-only training, and only fast-rate materials of the same speakers were used for fast-only training 关5 vowels⫻ 2 lengths⫻ 2 repetitions⫻ 8 blocks兴. For slow-fast training, both of those slow- and fast-rate materials were used, but the total number of trials was the same as that in other training 关2 rates⫻ 5 vowels⫻ 2 lengths⫻ 2 repetitions⫻ 4 blocks兴. For slow-fast training, the rate of stimuli stayed constant within each block, either slow or fast. Slow-rate and fast-rate blocks were presented alternately 共see Table I for the block structure兲. Each session was broken into 8 blocks of 20 trials, and the first and the fifth blocks were preceded by three examples. When participants heard audio stimuli through headphones attached to a PC, the carrier sentence, soko wa___ to kaite arimasu 共“___ is written there”兲 was displayed on the screen. Participants were asked to identify whether the second vowel of the disyllables inserted in the underlined part of the sentence was short or long. The participants received feedback on each response during training. The feedback consisted of a screen with the correct answer written out as either “short” or “long” and the target word spelled out in Romanized letters. Words with short vowels 共containing two moras兲 had two dots over them, and words with a long vowel 共containing three moras兲 had three dots. When participants answered correctly, the sign “Correct” appeared on the screen, and it enabled the participants to go on to the next Hirata et al.: Japanese vowel length in varied rates TABLE II. Test stimuli. Block and speaker Carrier sentence Rate soTe ga ___ da to omoimasu 共‘I think that is ___.’兲 asoko ni ___ to aTimasu 共‘It says ___ over there.’兲 koko wa ___ dca aTimasen 共‘This is not ___.’兲 hontob ni ___ wa kaitenai 共‘It’s true that ___ is not written.’兲 soko de ___ to wa iwanakatba 共‘I didn’t say ___ there.’兲 kitbo ___ de wa nai debob 共‘It certainly won’t be ___.’兲 snfa koTe ga ___ da to kikimabita 共’I heard that this is ___.’兲 koko ni ___ to aTimasu ne 共‘It says ___ here, right?’兲 aTe wa ___ dca nai desu jo 共‘I tell you, that’s not ___.’兲 zetbai ni ___ wa kakaTeteta 共‘___ was definitely written.’兲 sobite ___ to itbe kudasai 共‘And then, please say ___.’兲 tabun ___ de wa aTimasen 共‘It’s probably not ___.’兲 snf Pretest 共a兲 Male 3 共b兲 Female 3 共c兲 Male 3 共d兲 Female 3 共e兲 Male 3 共f兲 Female 3 snf snf snf snf snf Post-test 共a兲 Male 3 共b兲 Female 3 共c兲 Male 3 共d兲 Female 3 共e兲 Male 3 共f兲 Female 3 snf snf snf snf snf a “snf” indicates that stimuli of slow, normal, and fast rates were presented randomly within the block. 共a兲–共f兲 indicate blocks. trial. When participants answered incorrectly, the sign “Sorry…” appeared, and they were made to click “Play again” to hear the stimulus repeated three additional times. In each session, participants were asked to take a threeminute break after the fourth block. C. Testing procedure All of the four groups 共slow-only, fast-only, slow-fast, and control兲 took a pretest and a post-test approximately two weeks apart. Each test consisted of 180 stimuli. Each stimulus used one of ten words drawn from five pairs of real Japanese words: /Tubi/-/Tubib/, /ise/-/iseb/, /Tika/-/Tikab/, /kato/-/katob/, and /saju/-/sajub/ 共meaning “agate”-“ruby,” “共name of a place兲”-“opposite gender,” “science”-“liquor,” “transition”-“共surname兲,” and “hot water”-“left and right”兲, all accented on the first vowel. The stimuli were recorded by two NJ speakers who did not appear in the training sessions 共Male 3 and Female 3; Table II兲. The NJ speakers recorded the words in three different carrier sentences, and each test had different carrier sentences 共see Table II兲. Each sentence was spoken at three rates: slow, normal, and fast. Following Hirata 共2004a兲 and Port 共1977兲, speakers were given the following definition of the speaking rates: “tempo that is relaxed and comfortable” for the normal rate, “slowest tempo possible while keeping the sentence flowing together without 3840 J. Acoust. Soc. Am., Vol. 121, No. 6, June 2007 obviously inserting breaks between words” for the slow rate, and “fastest tempo possible without making errors” for the fast rate. Analyses on duration of sentences and target vowels confirmed 共1兲 that speakers spoke with three distinct speaking rates, and 共2兲 that they made a clear distinction between short and long vowels within each rate. The sentence speaking rates across speakers were 239 共SD⫽24.8兲, 136 共SD ⫽15.2兲, and 105 共SD⫽6.4兲 ms/mora for slow, normal, and fast rates, respectively, and the effect of rate was significant 关F共2 , 66兲 = 435.62, p ⬍ 0.001兴. Comparisons of any two rates showed significant differences 关p ⬍ 0.001兴. The effect of speaker was not significant 关F共1 , 66兲 = 0.17, p = 0.686兴, although there was a significant interaction of rate and speaker 关F共2 , 66兲 = 4.29, p = 0.018兴. The two speakers’ rates did not significantly differ for slow 共244 vs 235 ms/ mora兲 and fast rates 共106 vs 103 ms/ mora兲, but their normal rates were different 共128 vs 145 ms/ mora兲. The important point here, however, is that there were three distinct speaking rates across speakers. Analysis on target vowel duration also confirmed that the three speaking rates were distinct for both speakers and for both short and long vowels. Each speaker’s short vowels were significantly different among the three rates 共Male 3: 152 vs 75 vs 64 ms; Female 3: 148 vs 83 vs 59 ms for slow, normal, and fast rates, respectively兲, and so were the long vowels 共Male 3: 405 vs 165 vs 141 ms; FeHirata et al.: Japanese vowel length in varied rates TABLE III. Summary of significant main effects and interactions in 4 ⫻ 2 ⫻ 3 ANOVA. Effect Test Rate Test⫻ rate Group⫻ test Group⫻ rate Group⫻ test⫻ rate F df p 104.83 67.95 11.24 2.51 2.23 3.37 共1,58兲 共2,116兲 共2,116兲 共3,58兲 共6,116兲 共6,116兲 0.000 0.000 0.000 0.068 0.045 0.004 male 3: 341 vs 165 vs 109 ms兲关p ⬍ 0.01兴. These results were very similar to those of the materials in Hirata 共2004a兲, a subset of which was used for the present training stimuli. An NJ speaker identified vowel length of these test stimuli with 98.9% accuracy. The pretest and post-test each included 5 vowels⫻ 2 lengths⫻ 3 rates⫻ 3 carrier sentences⫻ 2 speakers for a total of 180 trials. Each test consisted of six blocks of 30 trials. Within a given block, every trial used the same carrier sentence, spoken by the same speaker 共Table II兲. The stimuli were presented in a random order across rates. No words, sentences, or speakers from training were used in the tests. The pretest and the post-test used identical words but different carrier sentences. Each block was preceded by two examples using the same nonsense words used in training 共e.g., /meme/ and /memeb/兲. Immediately after each example, the answer, either “short” or “long,” was displayed on the screen. Participants were asked to take a three-minute break after the first three blocks. The testing procedure was the same as the training procedure, except that participants did not receive any feedback on their responses. D. Analysis Raw percent correct test scores were transformed into rationalized arcsine units 共RAU; Studebaker, 1985兲 to correct nonlinearity of the test score scale. A mixed-design Analysis of Variance 共ANOVA兲 was conducted with these RAU scores. Group 共slow-only, fast-only, slow-fast, control兲 was a between-subjects factor, and test 共pretest, post-test兲 and rate 共slow, normal, fast兲 were repeated-measures factors. If there is an overall effect of training, a significant group⫻ test interaction should be found. Furthermore, if an effect of training is found only for specific rates, a significant group ⫻ test⫻ rate should be found. Figures were made with the raw percent correct scale for ease of interpretation. we can say that participants were less able to identify length of vowels in the same words spoken at faster rates than at slower rates. A significant main effect was found for test 共pretest: 65.8%; post-test: 73.6%兲, and a significant test⫻ rate interaction was found in ANOVA 共Table III兲, indicating that participants showed significant improvement from the pretest to the post-test, but that the amount of improvement differed across rates. The mean improvement, averaged over groups, was highest for the slow rate 共from 69.9 to 81.1%兲 共11.2 percentage points improvement; below expressed as % improvement兲 关p ⬍ 0.001兴, less for the normal rate 共7.8% improvement from 64.9 to 72.7%兲 关p ⬍ 0.001兴, and lowest for the fast rate 共4.1 % improvement from 62.7 to 66.8%兲 关p = 0.001兴. A group⫻ test interaction, which would indicate the overall effectiveness of training, was only marginally significant 共Table III兲. The post-test scores were significantly higher than the pretest scores for all four groups 共slow-fast training group: 65.5 vs 74.6% 关t = 6.52, p ⬍ 0.001兴; slow-only training group: 66.7 vs 75.3% 关t = 6.86, p ⬍ 0.001兴; fast-only training group: 65.5 vs 73.3% 关t = 4.95, p ⬍ 0.001兴; control group: 65.6 vs 69.9% 关t = 2.43, p = 0.028兴兲. Differences among the four groups in the overall amount of improvement between the pretest and post-test scores were small, as shown in Fig. 1. III. RESULTS The results of ANOVA are summarized in Table III. With regard to the participants’ performance on the three speaking rates 共question 1兲, a significant effect of rate was found. The mean test scores were significantly higher for the slow 共75.6%兲 than normal rate 共68.8%兲, and for the normal than fast rate 共64.8%兲. It is important to note that the tests for the three rates included identical sets of words and sentences, and the only difference was that of the speaking rates. Thus, 3841 J. Acoust. Soc. Am., Vol. 121, No. 6, June 2007 FIG. 1. Overall pretest 共white boxes兲 and post-test 共dark boxes兲 scores of the four groups. All test rates combined. The horizontal dashed line indicates the chance level performance. Hirata et al.: Japanese vowel length in varied rates FIG. 2. Percent correct test scores of the four groups by rate 共a: slow; b: normal; c: fast兲. Significant difference between pretest 共white boxes兲 and post-test 共dark boxes兲 关*** p = ⬍ 0.001; ** p = ⬍ 0.01; * p = ⬍ 0.05兴, nonsignificant difference between pretest and post-test 关n.s. p ⬎ 0.05兴. The horizontal dashed lines indicate the chance level performance. The overall improvement made by the control group was due to effects of repetition and exposure, since they took two tests. As mentioned in the introduction, training was said to be “effective” if the amount of improvement made by the individual trained groups was significantly greater than that made by the control group. Dunnett’s pairwise multiple comparison t-tests 共which contrasted the three training groups against the control group兲 indicated that the amount of improvement made by the slow-fast training group 共9.1% improvement兲 was significantly greater than that by the control 共4.3%兲 关p = 0.040兴, indicating a genuine effect of training. However, the slow-only training group’s improvement 共8.6%兲 was only marginally greater than the control group’s improvement 关p = 0.059兴. The fast-only training group’s improvement 共7.8%兲 was not significantly greater than the control group’s improvement 关p = 0.116兴. Thus, with regard to question 共2兲, we found that training with sentences, particularly slow-fast training, was indeed effective in enabling participants to improve their perceptual ability to identify Japanese vowel length contrast. With regard to question 共3兲, we found that, as far as the overall test scores were concerned, two-rate training was advantageous to one-rate training: slow-fast training was most effective, while only a marginal effect was found for slow-only training and no effect was found for fast-only training. A significant group⫻ test⫻ rate interaction was found in ANOVA 共Table III兲, indicating that the four groups’ differing amount of improvement from the pretest to the post-test depended on the speaking rate. Below, the four groups’ test scores were examined separately for each rate. Robust effects of training were found for the slow rate, as shown in Fig. 2共a兲. All of the three training groups’ scores improved significantly from the pretest to the post-test: 12.3% improvement 共means= from 70.4 to 82.7%兲 for the fast-only training group 关p ⬍ 0.001兴; 15.5% improvement 共69.7 to 85.2%兲 for the slow-only training group 关p ⬍ 0.001兴; and 15.2% improvement 共70.2 to 85.4%兲 for the slow-fast training group 关p ⬍ 0.001兴. In contrast, the control group did not significantly improve their slow rate scores 共2.4% improvement from 69.4 to 71.8%兲 关p = 0.230兴. Thus, 3842 J. Acoust. Soc. Am., Vol. 121, No. 6, June 2007 significant effects of training were found for all three trained groups. It is interesting to note that the trained groups improved on the slow rate test scores regardless of the type of training they had received. The results of the normal rate test scores 关Fig. 2共b兲兴 were similar to those of the slow rate scores. All of the three training groups improved their scores significantly from the pretest to the post-test. A 10.9% improvement 共means = from 63.4 to 74.3%兲 was made by the fast-only training group 关p ⬍ 0.001兴, 7.4% improvement 共65.5 to 72.9%兲 by the slow-only training group 关p = 0.007兴, and 8.0% improvement 共64.4 to 72.4%兲 by the slow-fast training group 关p = 0.001兴. In contrast, the control group’s 4.9% improvement 共66.2 to 71.1%兲 was not significant 关p = 0.072兴. Thus, effects of training on the normal rate test scores were found for all three trained groups. Compared to the slow rate test scores, however, the amount of improvement was less for the normal rate. It is noteworthy that the trained groups improved on the normal rate scores even though none of them had received normal rate materials in training. The results of the fast rate test scores 关Fig. 2共c兲兴 differed from those of the other rates. The fast-only training group, rather unexpectedly, did not significantly improve on the fast rate test scores 共1.6% improvement from 62.6 to 64.2%兲 关p = 0.524兴, as was the case for the control group 共5.6% improvement from 61.5 to 67.1%兲 关p = 0.061兴. In contrast, the slow-only training group made a significant 4.0% improvement 共64.7 to 68.7%兲 关p = 0.046兴, and finally, the 5.5% improvement made by the slow-fast training group 共61.9 to 67.4%兲 was statistically robust 关p = 0.013兴. In summary, regarding question 共4兲 from the introduction, we found that the perceptual ability of NE speakers generalized from trained rate共s兲 to tested rates. For example, slow-fast training showed the strongest effects not only on the participants’ overall scores, but also on all of the three rate test scores separately, including the normal rate which was not used in training. Slow-only training improved participants’ performance most prominently on the slow and normal rate test stimuli, and much less but still significantly Hirata et al.: Japanese vowel length in varied rates on the fast rate stimuli, even though their overall, cross-rate scores did not show a clear effect of training as a whole. Least effective was fast-only training where an overall effect of training was not found. Examining each rate separately, effects of fast-only training were found in the slow and normal rate test scores, but not in the fast rate scores. IV. DISCUSSION AND CONCLUSIONS This study examined the extent to which training with sentences of different rates helped native English speakers to learn to perceive Japanese vowel length contrast in the sentence context. One major finding, related to our first question, was that the speaking rate of sentences was a crucial factor in participants’ perceptual abilities. The participants’ test scores were lower for faster rates, and the amount of improvement shown by trained participants was also smaller for faster rates. These results are consistent with previous findings by Tajima et al. 共2005兲 and Hirata 共2005a and 2005b兲, and suggest that even if non-native speakers are able to perceive L2 contrasts at a slow rate, this does not guarantee the same ability at faster rates. The second major finding, related to our second question, was that non-native speakers can learn to perceive L2 contrasts given only sentence input, supporting Hirata 共2004b兲. The present slow-fast training, in particular, which provided only sentences but no words in isolation, was found to improve the identification of Japanese vowel length contrast in the sentence context. The overall improvement of the slow-fast training group was significantly greater than that of the control group, which did not participate in training. Slow-only and fast-only training also showed some limited effects 共discussed further, below兲. This study provides additional evidence that adults are capable of learning difficult L2 contrasts with laboratory training, and not only with syllables or words in isolation 共Jamieson and Morosan, 1986; Logan et al., 1991; Bradlow et al., 1997; Tajima et al., 2002; Wang et al., 1999兲, but also with sentences 共Hirata, 2004b and 2004c兲. It should be noted, however, that the overall effect of training in the present study was fairly small, as indicated by the improvement of 9.1 percentage points for slow-fast training. This is less improvement than that shown in some previous studies of L2 contrast training that used a twoalternative forced choice identification method. For example, Bradlow et al.’s 共1997兲 subjects showed a significant improvement of 16 percentage points in distinguishing English / [ / and /1/. 共The mean pretest score in Bradlow et al. 共1997兲 was 65%, comparable to the pretest score of 65.8% in the present study.兲 On the lower end, Logan et al.’s 共1991兲 subjects showed a significant improvement of 7.8 percentage points 共though their mean pretest score was higher: 78.1%兲. The present study’s low level of improvement might partially be due to the fact that the test stimuli were embedded in sentences. This makes sense given that trained participants consistently showed less improvement in the sentence than word context in Hirata 共2004b兲. However, the crucial role of sentence training cannot be overlooked. Hirata 共2004b兲 found that participants who were trained with sentences 3843 J. Acoust. Soc. Am., Vol. 121, No. 6, June 2007 made robust perceptual improvement in both the isolatedword and sentence contexts, but those who were trained only with isolated words made less improvement in the sentence context. Thus, even though training with sentences is more time consuming, there is a practical advantage to using sentences in training, if our ultimate goal includes enabling nonnative speakers to perceive difficult L2 contrasts in fluent speech. Following Hirata 共2004b兲, it would be interesting to examine whether the present sentence training would improve participants’ ability to perceive isolated words as well as words in sentences, as in the present experiment. The third major finding of the present study relates to the differential effects of the three types of training, as addressed in our third question. The participants who received fast-only training improved least on their overall, cross-rate test scores. Their overall improvement 共7.8 percentage points兲 was not significantly greater than that of the control group 共4.3 percentage points兲. Slow-fast training was slightly more beneficial than slow-only training. The overall, crossrate improvement made by the slow-fast training group 共9.1 percentage points兲 was significantly greater than that made by the control group, but the slow-only training group’s improvement 共8.6 percentage points兲 was only marginally greater than the control group’s. In summary, only the slowfast training group improved significantly on the total test scores, as well as on the scores for each of the three rates. This result, that training with two rates was more effective than training with one rate 共slow-only or fast-only兲, supports Pisoni and Lively’s 共1995兲 high phonetic variability hypothesis. Their original hypothesis concerned the effectiveness of training with variability in talker and phonetic context. The present study provides support for the effectiveness of training with variability in speaking rate: the more variability in the rates of training stimuli, the more effective for non-native speakers’ perceptual learning. That slow-only training was slightly advantageous over fast-only training might also be consistent with the high variability hypothesis: variability of vowel as well as sentence duration in the present training was higher for the slow than fast rate 共see Hirata, 2004a兲. This claim, however, should be examined in the future by comparing effects of high-variability fast rate training versus low-variability slow rate training. The above conclusions in support of Pisoni and Lively’s 共1995兲 high variability hypothesis merit further discussion. First, the difference among the three types of training was quite small 共Fig. 1兲. Second, regardless of the type of training, participants showed a similar amount of improvement for the slow and normal rates 共Fig. 2兲. Note that no participant heard normal-rate sentences in training, but that all trained participants improved their test scores for the normal rate. These results, relevant to our fourth question, imply that even beginning L2 learners are able to normalize different speaking rates fairly easily within the range of slow to normal speaking rates. The ability to normalize speaking rate in native languages has been examined extensively, as reviewed in the introduction, and this ability seems to be utilized for L2 speech within a limited range of speaking rates even at an early stage of L2 learning. This result is in line with Kato et al.’s 共2004兲 finding that, with perceptual training, NE speakHirata et al.: Japanese vowel length in varied rates ers were able to learn to adapt their categorical boundary positions of Japanese vowel length contrast according to different contexts. Also worthy of discussion are the results of fast-only training, which perhaps are counterintuitive. Previous studies suggested that abilities acquired during L2 training are largely to deal with the kinds of stimuli that had been presented in training 共e.g., synthetic versus natural speech, word versus sentence stimuli兲 共Strange and Dittmann, 1984; Morosan and Jamieson, 1989; Hirata, 2004c兲. This would predict that fast-only training should improve scores for the fast rate, but not the other rates. However, the opposite results were found: the fast-only training participants improved on slow and normal stimuli, but not on fast stimuli. In addition, this training was found to be least effective. Jamieson and Morosan’s 共1986兲 perceptual fading technique, which successfully enabled Canadian francophones to identify the English /ð/-// contrast, is useful in understanding these results. Their method initially reduces stimulus uncertainty by providing the longest consonant frication in the /ð#/-/#/ syllable continuum, and as training progresses, increases stimulus uncertainty by including syllables with progressively shorter frication. This method helps learners to initially focus their attention on critically relevant cues, and then to gradually learn to handle within-category acoustic variation. It is possible, then, that participants in the present fast-only training suffered from not having first been provided with critically relevant cues with little uncertainty. The durational difference between short and long vowels is about 50 ms for the fast rate, compared to about 150 ms for the slow rate 共Hirata, 2004a兲 and, thus, fast-only training participants might have been unable to grasp the most critical cue, which is vowel duration, solely from the fast stimuli.2 Another possible interpretation of the ineffectiveness of fast-only training as compared to slow-only training might concern short-term memory effects. It is possible that attentional resources available in short-term memory are limited when learning to perceive unfamiliar non-native contrasts, and that there is a higher demand in processing fast than slow rate sentences. If so, then because of this higher demand, the fast rate speech signals might have degraded more quickly than the slow rate speech signals in short-term memory, and this might be why participants with fast-only training were unable to learn the critical acoustic cue of vowel duration as effectively. We expected from Pisoni and Lively’s 共1995兲 high phonetic variability hypothesis that single rate training, whether fast-only or slow-only, would have the same effect on overall perceptual learning. However, the differential results of fast-only and slow-only training cannot be explained so simply by this hypothesis in regards to the issue of speaking rate variability. The present results may suggest that different processing and memory demands for slow versus fast rate stimuli are at play, and the amount and quality of information transmitted from shortterm memory to long-term memory do matter. This factor may augment Pisoni and Lively’s 共1995兲 high phonetic variability hypothesis in understanding how temporal characteristics of stimuli play a role in the ultimate formation of an L2 perceptual category. 3844 J. Acoust. Soc. Am., Vol. 121, No. 6, June 2007 On a final note, in order to achieve more robust perceptual learning, we might wish to train subjects on sentences spoken at a greater number of speaking rates. Introducing several more speakers would also create additional, possibly useful, speaking rate variation. A related issue is the distribution of stimuli among slow, normal, and fast rates. The speaking rates of the present test materials were 239, 136, and 105 ms/ mora for slow, normal, and fast rates, respectively, showing a greater difference between the slow and normal rates than the difference between the normal and fast rates. It is possible that differential effects of training might appear more strongly if fast rate materials are even faster or if three speaking rates are distributed more equally. A future study should also examine whether non-native listeners benefit from randomizing rates at every trial instead of by block, as the former has been found to provide more difficulty than the latter 共Tajima et al., 2005兲. Finally, the carrier sentence in the present training was held constant for the purpose of examining the precise effect of speaking rate, but a future study should also examine whether providing different carrier sentences would benefit learners. Although additional research is thus necessary, the present study reveals that sentential speaking rate is an important factor in L2 learners’ acquisition of duration-based distinctions. ACKNOWLEDGMENTS This study was supported by National Science Foundation Grant No. BCS0418246, given to Y.H. We thank Jon Bernard for his assistance in creating the training program and editing the manuscript, and Jacob Whiton, Connor Forbes, Lucia Livak, Jess Worby, Carol Glenn, Stephen Lambacher, and Spencer Kelly for their help in various stages of this project. We also thank reviewers for their insightful comments on the earlier version of this manuscript. See Hirata 共2004a兲 for more details of duration measurements of these training materials. A subset of the materials was also used for NJ speakers’ perception in Hirata and Lambacher 共2004兲. In that study, NJ speakers identified short and long vowels of these materials with 98.1% accuracy. 2 Here, we used differences in absolute vowel duration as an example of speaking rate differences, but what the precise measure of variability should ideally be is an empirical question. One could argue that relational measures, such as ratios of short and long vowels, might be more appropriate than the measure of absolute duration. The present experiment showed generalizability of learning from one rate to another, which might suggest that relational measures are more suitable as a measure of variability. However, other evidence suggests that absolute duration plays an important role for non-native listeners 共Hirata, 2005a; Tajima, 2006兲, as well as for native listeners when stimuli are impoverished 共Hirata and Lambacher, 2004兲. 1 Bradlow, A. R., Pisoni, D. B., Yamada, R. A., and Tohkura, Y. 共1997兲. “Training Japanese listeners to identify English /r/-/l/: IV. Some effects of perceptual learning on speech production,” J. Acoust. Soc. Am. 101, 2299–2310. Denes, P. 共1955兲. “Effect of duration on the perception of voicing,” J. Acoust. Soc. Am. 27, 761–764. Francis, A. L., and Nusbaum, H. C. 共2002兲. “Selective attention and the acquisition of new phonetic categories,” J. Exp. Psychol. Hum. Percept. Perform. 28, 349–366. Fujisaki, H., Nakamura, K., and Imoto, T. 共1975兲. “Auditory perception of duration of speech and non-speech stimuli,” in Auditory analysis and perception of speech, edited by G. Fant and M. A. A. Tatham 共Academic Press, London兲, pp. 197–219. Hirata et al.: Japanese vowel length in varied rates Gay, T. 共1981兲. “Mechanisms in the control of speech rate,” Phonetica 38, 148–158. Gay, T., and Hirose, H. 共1973兲. “Effect of speaking rate on labial consonant production,” Phonetica 27, 44–56. Gay, T., Ushijima, T., Hirose, H., and Cooper, F. S. 共1974兲. “Effect of speaking rate on labial consonant-vowel articulation,” J. Phonetics 2, 47– 63. Gottfried, T. L., Miller, J. L., and Payton, P. E. 共1990兲. “Effect of speaking rate on the perception of vowels,” Phonetica 47, 155–172. Han, M. 共1992兲. “The timing control of geminate and single stop consonants in Japanese: A challenge for nonnative speakers,” Phonetica 49, 102–127. Hirata, Y. 共1990a兲. “Perception of geminated stops in Japanese word and sentence levels,” The Bulletin of the Phonetic Society of Japan 194, 23– 28. Hirata, Y. 共1990b兲. “Perception of geminated stops in Japanese word and sentence levels by English-speaking learners of Japanese language,” The Bulletin of the Phonetic Society of Japan 195, 4–10. Hirata, Y. 共2004a兲. “Effects of speaking rate on the vowel length distinction in Japanese,” J. Phonetics 32, 565–589. Hirata, Y. 共2004b兲. “Training native English speakers to perceive Japanese length contrasts in word versus sentence contexts,” J. Acoust. Soc. Am. 116, 2384–2394. Hirata, Y. 共2004c兲. “Computer assisted pronunciation training for native English speakers learning Japanese pitch and durational contrasts,” Comp. Assist. Lang. Learning 17, 357–376. Hirata, Y. 共2005a兲. “Effects of speaking rate on native English speakers’ identification of Japanese vowel length,” ISCA Workshop on Plasticity in Speech Perception, A36. Hirata, Y. 共2005b兲. “Factors affecting native English speakers’ identification of Japanese vowel length,” First ASA Workshop on Second Language Speech Learning, 13. Hirata, Y., and Lambacher, S. G. 共2004兲. “Role of word-external contexts in native speakers’ identification of vowel length in Japanese,” Phonetica 61, 177–200. Hirata, Y., and Tsukada, K. 共2004兲. “The effects of speaking rates and vowel length on formant movements in Japanese,” in Proceedings of the 2003 Texas Linguistics Society Conference: Coarticulation in Speech Production and Perception, edited by A. Agwuele, W. Warren, and S. H. Park 共Cascadilla Proceedings Project, Somerville, MA兲, pp. 73–85. Jamieson, D. G., and Morosan, D. E. 共1986兲. “Training non-native speech contrasts in adults: Acquisition of the English /ð/-// contrast by francophones,” Percept. Psychophys. 40, 205–215. Johnson, T. L., and Strange, W. 共1982兲. “Perceptual constancy of vowels in rapid speech,” J. Acoust. Soc. Am. 72, 1761–1770. Kato, H., Tajima, K., Rothwell, A., Akahane-Yamada, R., and Munhall, K. 共2004兲. “Perception of phonemic length contrasts in Japanese with or without a carrier sentence by native and non-native listeners,” Proceedings of the 18th International Congress on Acoustics, I-609-612. Kato, H., Wilson, A., Tajima, K., and Akahane-Yamada, R. 共2005兲. “Effects of training conditions on the learning of phonemic length contrasts in Japanese: Speaking rate and presentation context,” Nihon Onkyoo Gakkai Kooen Ronbunshuu, 851–852. Kawai, G., and Hirose, K. 共2000兲. “Teaching the pronunciation of Japanese double-mora phonemes using speech recognition technology,” Speech Commun. 30, 131–143. Kondo, Y. 共1995兲. “Production of schwa by Japanese speakers of English: A crosslinguistic study of coarticulatory strategies,” doctoral dissertation 共University of Edinburgh, UK兲. Landahl, K., and Ziolkowski, M. 共1995兲. “Discovering phonetic units: Is a picture worth a thousand words?” Papers from the 31st Regional Meeting of the Chicago Linguistic Society 1, 294–316. Landahl, K., Ziolkowski, M., Usami, M., and Tunnock, B. 共1992兲. “Interactive articulation: Improving accent through visual feedback,” in The Proceedings of the Second International Conference on Foreign Language Education and Technology, edited by I. Shinjo et al. 共The Language Laboratory Association of Japan, Kasugai, Aichi, Japan兲, pp. 283–292. Logan, J. S., Lively, S. E., and Pisoni, D. B. 共1991兲. “Training Japanese listeners to identify English /r/ and /1/: A first report,” J. Acoust. Soc. Am. 89, 874–886. Magen, H. S., and Blumstein, S. E. 共1993兲. “Effects of speaking rate on the vowel length distinction in Korean,” J. Phonetics 21, 387–409. Miller, J. L. 共1987兲. “Rate-dependent processing in speech perception,” in 3845 J. Acoust. Soc. Am., Vol. 121, No. 6, June 2007 Progress in the Psychology of Language, Vol. 3, edited by A. Ellis 共Erlbaum Associates, Hillsdale兲, pp. 119–157. Miller, J. L., and Liberman, A. M. 共1979兲. “Some effects of later-occurring information on the perception of stop consonant and semivowel,” Percept. Psychophys. 25, 457–465. Morosan, D., and Jamieson, D. G. 共1989兲. “Evaluation of a technique for training new speech contrasts: Generalization across voices, but not wordposition or task,” J. Speech Hear. Res. 32, 501–511. Newman, R. S., and Sawusch, J. R. 共1996兲. “Perceptual normalization for speaking rate: Effects of temporal distance,” Percept. Psychophys. 58, 540–560. Oguma, R. 共2000兲. “Perception of Japanese long vowels and short vowels by English speaking learners,” Japanese-Language Education Around the Globe 10, 43–55. Pickett, E. R., Blumstein, S. E., and Burton, M. W. 共1999兲. “Effects of speaking rate on the singleton/geminate consonant contrast in Italian,” Phonetica 56, 135–157. Pisoni, D. B. 共1997兲. “Some thoughts on “normalization” in speech perception,” in Talker Variability in Speech Processing, edited by K. Johnson and J. W. Mullennix 共Academic Press, New York兲, pp. 9–32. Pisoni, D. B., and Lively, S. E. 共1995兲. “Variability and invariance in speech perception: A new look at some old problems in perceptual learning,” in Speech Perception and Linguistic Experience: Issues in Cross-Language Speech Research, edited by W. Strange 共York, Timonium, MD兲, pp. 433– 459. Port, R. F. 共1977兲. The Influence of Speaking Tempo on the Duration of Stressed Vowel and Medial Stop in English Trochee Words 共Indiana University Linguistics Club, Bloomington, IN兲, pp. 1-77. Port, R. F., and Dalby, J. 共1982兲. “Consonant/vowel ratio as a cue for voicing in English,” Percept. Psychophys. 32, 141–152. Sawusch, J. R., and Newman, R. S. 共2000兲. “Perceptual normalization for speaking rate II: Effects of signal discontinuities,” Percept. Psychophys. 62, 285–300. Strange, W., and Dittmann, S. 共1984兲. “Effects of discrimination training on the perception of /r-1/ by Japanese adults learning English,” Percept. Psychophys. 36, 131–145. Studebaker, G. A. 共1985兲. “A “rationalized” arcsine transform,” J. Speech Hear. Res. 28, 455–462. Tajima, K. 共2006兲. “Perceiving prosodic units in a second language: A crosslanguage comparison,” International Workshop on the Auditory Processing of Prosodic Features and its Applications: Acquisition of Linguistic Organization and Human Audition, 40. Tajima, K., Rothwell, A., and Munhall, K. G. 共2002兲. “Native and nonnative perception of phonemic length contrasts in Japanese: Effect of identification training and exposure,” J. Acoust. Soc. Am. 112, 2387. Tajima, K., Kato, H., Yamada, R., Rothwell, A., and Munhall, K. G. 共2005兲. “Native and non-native perception of phonemic length contrasts in Japanese: Effect of speaking rate,” Nihon Ninchi-Shinri Gakkai Dai 3-kai Taikai Happyoo Ronbunshuu, 69. Toda, T. 共1997兲. “Strategies for producing mora timing by non-native speakers of Japanese,” in Acquisition of Japanese as a Second Language, edited by Daini Gengo Shuutoku Kenkyuukai 共Bonjinsha, Tokyo兲, pp. 157–197. Tsukada, K. 共1999兲. “An acoustic phonetic analysis of Japanese-accented English,” doctoral dissertation 共Macquarie University兲. Ueyama, M. 共2000兲. “Prosodic transfer: An acoustic study of L2 English vs L2 Japanese,” doctoral dissertation 共University of California at Los Angeles兲. Vance, T. J. 共1987兲. An Introduction to Japanese Phonology 共State University of New York Press, Albany, NY兲. Verbrugge, R. R., and Shankweiler, D. 共1977兲. “Prosodic information for vowel identity,” Haskins Laboratories Status Report on Speech Research SR-51/52, 27–35. Wang, Y., Spence, M. M., Jongman, A., and Sereno, J. A. 共1999兲. “Training American listeners to perceive Mandarin tones,” J. Acoust. Soc. Am. 106, 3649–3658. Wayland, S. C., Miller, J. L., and Volaitis, L. E. 共1994兲. “The influence of sentential speaking rate on the internal structure of phonetic categories,” J. Acoust. Soc. Am. 95, 2694–2701. Yamada, T., Yamada, R. A., and Strange, W. 共1995兲. “Perceptual learning of Japanese mora syllables by native speakers of American English: Effects of training stimulus sets and initial states,” Proceedings of the 14th International Congress of Phonetic Sciences 1, 322–325. Hirata et al.: Japanese vowel length in varied rates
© Copyright 2026 Paperzz