Training native English speakers to identify

Training native English speakers to identify Japanese vowel
length contrast with sentences at varied speaking ratesa)
Yukari Hirata,b兲 Elizabeth Whitehurst, and Emily Cullings
Colgate University, Department of East Asian Languages and Literatures, Hamilton, New York 13346
共Received 18 August 2006; revised 26 March 2007; accepted 26 March 2007兲
Native English speakers were trained to identify Japanese vowel length in three types of training
differing in sentential speaking rate: slow-only, fast-only, and slow-fast. Following Pisoni and
Lively’s high phonetic variability hypothesis 关Pisoni, D. B., and Lively, S. E., Speech Perception
and Linguistic Experience, 433–459 共1995兲兴, higher stimulus variability by means of training with
two rates was hypothesized to aid learners in adapting to speech rate variation more effectively than
training with only one rate. Trained participants identified the length of the second vowel of
disyllables, short or long, embedded in a sentence of the respective rate, and received immediate
feedback. The three trained groups’ abilities before and after training were examined with tests
containing sentences of slow, normal, and fast rates, and were compared with those of a control that
was not trained. A robust effect of slow-fast training, a marginal effect of slow-only training, but no
significant effect of fast-only training were found in the overall test scores. Slow-fast and slow-only
training showed small advantages over fast-only training on the fast-rate test scores, while effects
for all three training types were found on the slow- and normal-rate test scores. The degree to which
the results support the high phonetic variability hypothesis is discussed. © 2007 Acoustical Society
of America. 关DOI: 10.1121/1.2734401兴
PACS number共s兲: 43.71.Hw, 43.71.Es 关ARB兴
I. INTRODUCTION
This study examined the way non-native speakers perceive Japanese vowel length contrast in sentences produced
at various speaking rates. The vowel length distinction 共short
versus long兲 is phonemic in Japanese, e.g., /kuTo/ “black”
versus /kuTob/ “hardships,” and all five short vowels /i e a o
u/ contrast with the corresponding long vowels 共/ib eb ab ob
ub/兲 共Vance, 1987兲. Long vowels are 2.2–3.2 times longer in
duration than short vowels, whereas only small differences
have been observed between the formant frequencies of short
and long vowels 共Kondo, 1995; Tsukada, 1999; Ueyama,
2000; Hirata, 2004a; Hirata and Tsukada, 2004兲. This vowel
length distinction, as well as a consonant length distinction
共e.g., /kata/ “shoulder” versus /katba/ “won”兲 has been found
difficult for native English 共NE兲 speakers to acquire 共e.g.,
Landahl et al., 1992; Han, 1992; Landahl and Ziolkowski,
1995; Yamada et al., 1995; Toda, 1997; Oguma, 2000;
Tajima et al., 2002; Hirata, 2004b兲. Native Japanese speakers’ primary perceptual cue to the Japanese vowel length
distinction is vowel duration 共Fujisaki et al., 1997兲, but a
long vowel spoken quickly can be shorter than a short vowel
spoken slowly 共Hirata, 2004a兲. Speaking rate variation is
thought to contribute to the difficulty of acquiring the length
distinctions.
Research over the past 50 years has shown how a native
speaker’s perception of a phonetic segment is affected by the
a兲
Portions of this work were presented at the International Conference on
Spoken Language Processing, Pittsburgh, September 2006, and at the
fourth Joint Meeting of the Acoustical Society of America and the Acoustical Society of Japan, Honolulu, Hawaii, December 2006.
b兲
Author to whom correspondence should be addressed. Electronic mail:
[email protected]
J. Acoust. Soc. Am. 121 共6兲, June 2007
Pages: 3837–3845
rates of the segment and of the surrounding context 共Denes,
1955; Port, 1977; Verbrugge and Shankweiler, 1977; Miller
and Liberman, 1979; Port and Dalby, 1982; Johnson and
Strange, 1982; Gottfried et al., 1990; Newman and Sawusch,
1996; Sawusch and Newman, 2000兲. For example, Pickett et
al. 共1999兲 found that native Italian speakers’ perception of
Italian single and geminate stops was based on the stop closure duration relative to the preceding vowel. Listeners also
use the speaking rate of a sentence to perceive native language 共L1兲 contrasts of which the primary acoustic correlate
is duration 共e.g., Miller, 1987, and Wayland et al., 1994, for
English /p/-/b/ contrast; Magen and Blumstein, 1993, for Korean short and long vowels兲. These studies in L1 perception
show rate normalization: native listeners do not identify a
phonetic segment based on the absolute duration of the segment. Rather, their perception takes into account both the
duration of neighboring segments and the segments that are
contrastive in the language.
Consistent with the findings above, native Japanese 共NJ兲
speakers use the rate of a sentence as a perceptual cue for
distinguishing Japanese consonant and vowel length 共Hirata,
1990a; Hirata and Lambacher, 2004兲. NJ listeners in Hirata
共1990a兲 identified the pair of words /ita/ “stayed” versus
/itba/ “went” based on the durational ratio of the stop closure
to the preceding vowel /i/ when the words were presented in
isolation. In a second experiment, an edited word 关it共b兲ta兴 in
which the durational ratio was identified as ambiguous by NJ
listeners was embedded in a carrier sentence produced at
different speaking rates. NJ listeners identified this ambiguous word as /ita/ or /itba/ based on the rate of the sentence.
When the sentence was spoken slowly, the word was clearly
identified as /ita/, and when the sentence was spoken faster,
0001-4966/2007/121共6兲/3837/9/$23.00
© 2007 Acoustical Society of America
3837
the word was clearly identified as /itba/. The perceptual cue
present in the sentence rate superseded the ambiguous cue
present in the isolated word.
Compared to rate normalization by native listeners, less
is known as to how second language 共L2兲 learners normalize
speech rate and learn to make duration-based distinctions in
sentences. In Hirata 共1990b兲, when the Japanese words /ita/
and /itba/ were presented in isolation, NE speakers distinguished the words, similarly to NJ speakers, using the durational ratio of the stop closure to the preceding vowel as a
perceptual cue. However, when the words were embedded in
a sentence spoken at different rates, NE speakers were unable to use the rate of the sentences as a perceptual cue as NJ
listeners did. Thus, NE learners of Japanese have difficulty in
normalizing rate over a sentence in Japanese, even though
they can use the localized perceptual cue when isolated
words are presented.
What method would most effectively enable NE speakers to normalize the rate of sentences and identify vowel
length in an embedded word accurately? Given the results of
Hirata 共1990b兲, training L2 learners on isolated syllable or
word contexts might not be the best method. The development of training methods for difficult L2 contrasts has been
one of the most fruitful areas of L2 research 共English /[/-/l/:
Strange and Dittmann, 1984; Logan et al., 1991; Bradlow et
al., 1997; English /ð/-/␪/: Jamieson and Morosan, 1986;
Mandarin tones: Wang et al., 1999; Korean stops: Francis
and Nusbaum, 2002; Japanese length contrasts: Yamada et
al., 1995; Kawai and Hirose, 2000; Tajima et al., 2002兲.
However, these studies provided L2 contrasts in isolated syllables or words. Investigation of effects of training using
sentence stimuli has only recently begun 共Hirata, 2004b;
Hirata, 2004c; Kato et al., 2005兲. Studying the L2 vowel
length distinction in a sentence context is particularly important, as research has shown that native listeners use the rate
of sentences for making duration-based distinctions 共Hirata,
1990a; Hirata and Lambacher, 2004兲. A recent study 共Kato et
al., 2004兲 showed that, after perceptual training with isolated
words, NE speakers learned to perceive short and long vowels in sentences, indicating that their ability gained for the
isolated word context generalized to the sentence context
共see also Hirata, 2004b兲. However, Tajima et al.’s 共2005兲
results showed that the accuracy of NE speakers’ perception
of Japanese length contrast was affected by the speaking rate
of test stimuli, and that this effect did not decrease after
training with isolated words.
Pisoni and Lively’s 共1995兲 high phonetic variability hypothesis provides a useful guide in the search for a method
which would enable NE speakers to successfully identify
Japanese vowel length. This hypothesis claims that the diversity of acoustic cues present in materials produced by multiple speakers helps rather than hinders non-native listeners
in forming new perceptual categories. Traditionally, stimulus
variability in voices and speaking rates was viewed as
troublesome noise, extraneous to the encoding of linguistic
units, and was deliberately excluded from experiments by the
use of carefully controlled materials 共Pisoni, 1997兲. However, evidence has accumulated that high stimulus variability
in voices and phonetic contexts is informative, rather than
3838
J. Acoust. Soc. Am., Vol. 121, No. 6, June 2007
distracting, and actually assists perceptual learning 共Pisoni
and Lively, 1995; Bradlow et al., 1997兲. In the present study,
we tested this high variability hypothesis with regard to
speaking rate variation. We hypothesized that providing
variation in speaking rate helps non-native listeners to normalize rate and to learn to distinguish vowel length accurately.
The present study compared the relative effectiveness of
three types of perceptual training: one with only a slow rate,
one with only a fast rate, and one with both slow and fast
rates. Training materials were sentences naturally produced
by four NJ speakers at these rates. We chose this method
rather than creating various speaking rates with a synthesizer
because faster natural speech is not simply compressed
slower speech, but involves changes in the size and velocity
of various muscle contractions 共Gay and Hirose, 1973; Gay
et al., 1974; Gay, 1981兲. According to the hypothesis tested
in the present research, the variations that exist in natural
speech spoken at slow and fast rates are informative for nonnative listeners. The prediction was that speech materials
spoken at two rates 共slow and fast兲, contain more diverse and
rich acoustic cues than those spoken only at one rate 共slowonly or fast-only兲, and that this diversity aids non-native listeners in adjusting to different rates of speech and identifying
short/long vowels accurately.
Effects of the three types of training were examined by a
pretest and a post-test consisting of sentences produced by
two additional NJ speakers at three speaking rates: slow, normal, and fast. These three rates were included in order to
examine whether the ability gained through training with a
given rate or rates would generalize to rate共s兲 to which participants had not been exposed. The present study compared
each type of training with a control group that did not participate in training. Comparison with a control was important
because our pilot results indicated effects of repetition and
exposure, i.e., improvement of test scores simply by taking
the tests twice without training. Thus, training is said to be
“effective” if the amount of improvement from the pretest
and the post-test was significantly greater for the trained than
the control group. The study addressed the following questions:
共1兲 How do NE speakers perform in identifying Japanese
vowel length in carrier sentences spoken at three speaking rates 共slow, normal, and fast兲? Previous studies have
shown that rate effects interacted with word type 共Hirata,
2005a兲, vowel length 共i.e., short or long; Hirata, 2005b兲,
and context 共i.e., in isolation or in sentences; Tajima et
al., 2002 and Tajima et al., 2005兲. In these studies, identification accuracy in the sentence context is generally
lower for faster rates. The present study will examine if
this result is replicable.
共2兲 Does perceptual training with sentences, as opposed to
isolated words or syllables as in most previous studies,
aid NE speakers in identifying Japanese vowel length
accurately? This study will determine whether the
present three types of training with sentences enable effective L2 contrast learning.
共3兲 Does training with sentences spoken at two rates 共slow
Hirata et al.: Japanese vowel length in varied rates
TABLE I. Training stimuli.
Block and speaker
Show-only training
Session 1
Session 2
Session 3
Session 4
Fast-only training
Session 1
Session 2
Session 3
Session 4
Slow-fast training
Session 1
Session 2
Session 3
Session 4
共a兲–共h兲
共a兲–共h兲
共a兲–共h兲
共a兲–共h兲
Male 1
Female 1
Male 2
Female 2
共a兲–共h兲
共a兲–共h兲
共a兲–共h兲
共a兲–共h兲
Male 1
Female 1
Male 2
Female 2
共a兲–共h兲
共a兲–共h兲
共a兲–共h兲
共a兲–共h兲
Male 1
Female 1
Male 2
Female 2
Carrier sentence
Rate
ssssssssa
ssssssss
ssssssss
ssssssss
soko wa___to kaite aTimasu
共‘___ is written there.’兲
ffffffff
fffffiff
ffffffff
ffffffff
sfsfsfsf
sfsfsfsf
sfsfsfsf
sfsfsfsf
a
Indicates rate of stimuli that was presented within each of the eight blocks.
“s”⫽slow rate. “f”⫽fast rate. 共a兲–共h兲 indicate blocks.
and fast兲 improve overall perceptual accuracy more than
training with only one rate 共slow-only or fast-only兲? According to Pisoni and Lively’s 共1995兲 hypothesis, the
overall improvement should be higher for two-rate than
one-rate training.
共4兲 How does the perceptual ability of NE speakers generalize from trained rate共s兲 to tested rates? For example,
does slow-only training improve accuracy only for the
slow rate, or improve it for other rates as well? Does
slow-fast training improve accuracy only for these two
rates, but not for the untrained 共normal兲 rate?
II. METHODS
A. Participants
Sixty-two monolingual native speakers of American English, whose ages ranged from 18 to 22, were randomly assigned to one of the following four groups: slow-only training group 共n = 16; mean age⫽19.2兲, fast-only training group
共n = 16; mean age⫽19.2兲, slow-fast training group 共n = 14;
mean age⫽19.7兲, and a control group 共n = 16; mean age
⫽19.3兲. None of the participants had previously studied
Japanese or had extensive exposure to spoken Japanese. All
but one participant had studied one or two foreign languages,
usually Spanish and/or French, and one participant had studied five languages. None of the participants, however, had
native-level fluency in any foreign language. No participant
reported having any hearing problems. Participants were
paid for their participation.
B. Training procedure
Participants assigned to one of the three training groups
共slow-only, fast-only, and slow-fast兲 completed four training
sessions in addition to a pretest and a post-test. Participants
completed the experiment over a minimum period of 11 days
and a maximum of 17 days. Participants had any two consecutive training sessions at least 24 hours apart, and no
more than four days apart.
3839
J. Acoust. Soc. Am., Vol. 121, No. 6, June 2007
Training materials were a subset of materials used in
Hirata 共2004a兲. The target words were nonsense Japanese
words in the form of /mVmV/ and /mVmVV/ 共V⫽/i, e, a, o,
u/; e.g., /mimi/ vs /mimib/兲, with the pitch accent always on
the first vowel. These words were embedded in a single carrier sentence spoken by four native Japanese speakers 共Male
1, Female 1, Male 2, and Female 2; Table I兲. For slow-rate
and fast-rate training materials, the speakers were instructed
to speak as slowly and as fast as possible, respectively. The
four speakers’ mean speaking rate was 263 ms/ mora for the
slow rate and 105 ms/ mora for the fast rate.1
Each type of training consisted of four sessions. Each
session contained 160 stimuli by one speaker. As described
in Table I, only slow-rate materials were used for slow-only
training, and only fast-rate materials of the same speakers
were used for fast-only training 关5 vowels⫻ 2 lengths⫻ 2
repetitions⫻ 8 blocks兴. For slow-fast training, both of those
slow- and fast-rate materials were used, but the total number
of trials was the same as that in other training 关2 rates⫻ 5
vowels⫻ 2 lengths⫻ 2 repetitions⫻ 4 blocks兴. For slow-fast
training, the rate of stimuli stayed constant within each
block, either slow or fast. Slow-rate and fast-rate blocks were
presented alternately 共see Table I for the block structure兲.
Each session was broken into 8 blocks of 20 trials, and
the first and the fifth blocks were preceded by three examples. When participants heard audio stimuli through headphones attached to a PC, the carrier sentence, soko wa___ to
kaite arimasu 共“___ is written there”兲 was displayed on the
screen. Participants were asked to identify whether the second vowel of the disyllables inserted in the underlined part
of the sentence was short or long. The participants received
feedback on each response during training. The feedback
consisted of a screen with the correct answer written out as
either “short” or “long” and the target word spelled out in
Romanized letters. Words with short vowels 共containing two
moras兲 had two dots over them, and words with a long vowel
共containing three moras兲 had three dots. When participants
answered correctly, the sign “Correct” appeared on the
screen, and it enabled the participants to go on to the next
Hirata et al.: Japanese vowel length in varied rates
TABLE II. Test stimuli.
Block and speaker
Carrier sentence
Rate
soTe ga ___ da to omoimasu
共‘I think that is ___.’兲
asoko ni ___ to aTimasu
共‘It says ___ over there.’兲
koko wa ___ dca aTimasen
共‘This is not ___.’兲
hontob ni ___ wa kaitenai
共‘It’s true that ___ is not written.’兲
soko de ___ to wa iwanakatba
共‘I didn’t say ___ there.’兲
kitbo ___ de wa nai debob
共‘It certainly won’t be ___.’兲
snfa
koTe ga ___ da to kikimabita
共’I heard that this is ___.’兲
koko ni ___ to aTimasu ne
共‘It says ___ here, right?’兲
aTe wa ___ dca nai desu jo
共‘I tell you, that’s not ___.’兲
zetbai ni ___ wa kakaTeteta
共‘___ was definitely written.’兲
sobite ___ to itbe kudasai
共‘And then, please say ___.’兲
tabun ___ de wa aTimasen
共‘It’s probably not ___.’兲
snf
Pretest
共a兲 Male 3
共b兲 Female 3
共c兲 Male 3
共d兲 Female 3
共e兲 Male 3
共f兲 Female 3
snf
snf
snf
snf
snf
Post-test
共a兲 Male 3
共b兲 Female 3
共c兲 Male 3
共d兲 Female 3
共e兲 Male 3
共f兲 Female 3
snf
snf
snf
snf
snf
a
“snf” indicates that stimuli of slow, normal, and fast rates were presented randomly within the block.
共a兲–共f兲 indicate blocks.
trial. When participants answered incorrectly, the sign
“Sorry…” appeared, and they were made to click “Play
again” to hear the stimulus repeated three additional times.
In each session, participants were asked to take a threeminute break after the fourth block.
C. Testing procedure
All of the four groups 共slow-only, fast-only, slow-fast,
and control兲 took a pretest and a post-test approximately two
weeks apart. Each test consisted of 180 stimuli. Each stimulus used one of ten words drawn from five pairs of real
Japanese words: /Tubi/-/Tubib/, /ise/-/iseb/, /Tika/-/Tikab/,
/kato/-/katob/, and /saju/-/sajub/ 共meaning “agate”-“ruby,”
“共name of a place兲”-“opposite gender,” “science”-“liquor,”
“transition”-“共surname兲,” and “hot water”-“left and right”兲,
all accented on the first vowel. The stimuli were recorded by
two NJ speakers who did not appear in the training sessions
共Male 3 and Female 3; Table II兲. The NJ speakers recorded
the words in three different carrier sentences, and each test
had different carrier sentences 共see Table II兲. Each sentence
was spoken at three rates: slow, normal, and fast. Following
Hirata 共2004a兲 and Port 共1977兲, speakers were given the following definition of the speaking rates: “tempo that is relaxed and comfortable” for the normal rate, “slowest tempo
possible while keeping the sentence flowing together without
3840
J. Acoust. Soc. Am., Vol. 121, No. 6, June 2007
obviously inserting breaks between words” for the slow rate,
and “fastest tempo possible without making errors” for the
fast rate.
Analyses on duration of sentences and target vowels
confirmed 共1兲 that speakers spoke with three distinct speaking rates, and 共2兲 that they made a clear distinction between
short and long vowels within each rate. The sentence speaking rates across speakers were 239 共SD⫽24.8兲, 136 共SD
⫽15.2兲, and 105 共SD⫽6.4兲 ms/mora for slow, normal, and
fast rates, respectively, and the effect of rate was significant
关F共2 , 66兲 = 435.62, p ⬍ 0.001兴. Comparisons of any two rates
showed significant differences 关p ⬍ 0.001兴. The effect of
speaker was not significant 关F共1 , 66兲 = 0.17, p = 0.686兴, although there was a significant interaction of rate and speaker
关F共2 , 66兲 = 4.29, p = 0.018兴. The two speakers’ rates did not
significantly differ for slow 共244 vs 235 ms/ mora兲 and fast
rates 共106 vs 103 ms/ mora兲, but their normal rates were different 共128 vs 145 ms/ mora兲. The important point here,
however, is that there were three distinct speaking rates
across speakers. Analysis on target vowel duration also confirmed that the three speaking rates were distinct for both
speakers and for both short and long vowels. Each speaker’s
short vowels were significantly different among the three
rates 共Male 3: 152 vs 75 vs 64 ms; Female 3: 148 vs 83 vs
59 ms for slow, normal, and fast rates, respectively兲, and so
were the long vowels 共Male 3: 405 vs 165 vs 141 ms; FeHirata et al.: Japanese vowel length in varied rates
TABLE III. Summary of significant main effects and interactions in 4 ⫻ 2 ⫻ 3 ANOVA.
Effect
Test
Rate
Test⫻ rate
Group⫻ test
Group⫻ rate
Group⫻ test⫻ rate
F
df
p
104.83
67.95
11.24
2.51
2.23
3.37
共1,58兲
共2,116兲
共2,116兲
共3,58兲
共6,116兲
共6,116兲
0.000
0.000
0.000
0.068
0.045
0.004
male 3: 341 vs 165 vs 109 ms兲关p ⬍ 0.01兴. These results were
very similar to those of the materials in Hirata 共2004a兲, a
subset of which was used for the present training stimuli. An
NJ speaker identified vowel length of these test stimuli with
98.9% accuracy.
The pretest and post-test each included 5 vowels⫻ 2
lengths⫻ 3 rates⫻ 3 carrier sentences⫻ 2 speakers for a total
of 180 trials. Each test consisted of six blocks of 30 trials.
Within a given block, every trial used the same carrier sentence, spoken by the same speaker 共Table II兲. The stimuli
were presented in a random order across rates. No words,
sentences, or speakers from training were used in the tests.
The pretest and the post-test used identical words but different carrier sentences. Each block was preceded by two examples using the same nonsense words used in training 共e.g.,
/meme/ and /memeb/兲. Immediately after each example, the
answer, either “short” or “long,” was displayed on the
screen. Participants were asked to take a three-minute break
after the first three blocks.
The testing procedure was the same as the training procedure, except that participants did not receive any feedback
on their responses.
D. Analysis
Raw percent correct test scores were transformed into
rationalized arcsine units 共RAU; Studebaker, 1985兲 to correct
nonlinearity of the test score scale. A mixed-design Analysis
of Variance 共ANOVA兲 was conducted with these RAU
scores. Group 共slow-only, fast-only, slow-fast, control兲 was a
between-subjects factor, and test 共pretest, post-test兲 and rate
共slow, normal, fast兲 were repeated-measures factors. If there
is an overall effect of training, a significant group⫻ test interaction should be found. Furthermore, if an effect of training is found only for specific rates, a significant group
⫻ test⫻ rate should be found. Figures were made with the
raw percent correct scale for ease of interpretation.
we can say that participants were less able to identify length
of vowels in the same words spoken at faster rates than at
slower rates.
A significant main effect was found for test 共pretest:
65.8%; post-test: 73.6%兲, and a significant test⫻ rate interaction was found in ANOVA 共Table III兲, indicating that participants showed significant improvement from the pretest to
the post-test, but that the amount of improvement differed
across rates. The mean improvement, averaged over groups,
was highest for the slow rate 共from 69.9 to 81.1%兲 共11.2
percentage points improvement; below expressed as % improvement兲 关p ⬍ 0.001兴, less for the normal rate 共7.8% improvement from 64.9 to 72.7%兲 关p ⬍ 0.001兴, and lowest for
the fast rate 共4.1 % improvement from 62.7 to 66.8%兲 关p
= 0.001兴.
A group⫻ test interaction, which would indicate the
overall effectiveness of training, was only marginally significant 共Table III兲. The post-test scores were significantly
higher than the pretest scores for all four groups 共slow-fast
training group: 65.5 vs 74.6% 关t = 6.52, p ⬍ 0.001兴; slow-only
training group: 66.7 vs 75.3% 关t = 6.86, p ⬍ 0.001兴; fast-only
training group: 65.5 vs 73.3% 关t = 4.95, p ⬍ 0.001兴; control
group: 65.6 vs 69.9% 关t = 2.43, p = 0.028兴兲. Differences
among the four groups in the overall amount of improvement
between the pretest and post-test scores were small, as shown
in Fig. 1.
III. RESULTS
The results of ANOVA are summarized in Table III.
With regard to the participants’ performance on the three
speaking rates 共question 1兲, a significant effect of rate was
found. The mean test scores were significantly higher for the
slow 共75.6%兲 than normal rate 共68.8%兲, and for the normal
than fast rate 共64.8%兲. It is important to note that the tests for
the three rates included identical sets of words and sentences,
and the only difference was that of the speaking rates. Thus,
3841
J. Acoust. Soc. Am., Vol. 121, No. 6, June 2007
FIG. 1. Overall pretest 共white boxes兲 and post-test 共dark boxes兲 scores of
the four groups. All test rates combined. The horizontal dashed line indicates
the chance level performance.
Hirata et al.: Japanese vowel length in varied rates
FIG. 2. Percent correct test scores of the four groups by rate 共a: slow; b: normal; c: fast兲. Significant difference between pretest 共white boxes兲 and post-test
共dark boxes兲 关*** p = ⬍ 0.001; ** p = ⬍ 0.01; * p = ⬍ 0.05兴, nonsignificant difference between pretest and post-test 关n.s. p ⬎ 0.05兴. The horizontal dashed lines
indicate the chance level performance.
The overall improvement made by the control group was
due to effects of repetition and exposure, since they took two
tests. As mentioned in the introduction, training was said to
be “effective” if the amount of improvement made by the
individual trained groups was significantly greater than that
made by the control group. Dunnett’s pairwise multiple comparison t-tests 共which contrasted the three training groups
against the control group兲 indicated that the amount of improvement made by the slow-fast training group 共9.1% improvement兲 was significantly greater than that by the control
共4.3%兲 关p = 0.040兴, indicating a genuine effect of training.
However, the slow-only training group’s improvement
共8.6%兲 was only marginally greater than the control group’s
improvement 关p = 0.059兴. The fast-only training group’s improvement 共7.8%兲 was not significantly greater than the control group’s improvement 关p = 0.116兴. Thus, with regard to
question 共2兲, we found that training with sentences, particularly slow-fast training, was indeed effective in enabling participants to improve their perceptual ability to identify Japanese vowel length contrast. With regard to question 共3兲, we
found that, as far as the overall test scores were concerned,
two-rate training was advantageous to one-rate training:
slow-fast training was most effective, while only a marginal
effect was found for slow-only training and no effect was
found for fast-only training.
A significant group⫻ test⫻ rate interaction was found in
ANOVA 共Table III兲, indicating that the four groups’ differing
amount of improvement from the pretest to the post-test depended on the speaking rate. Below, the four groups’ test
scores were examined separately for each rate.
Robust effects of training were found for the slow rate,
as shown in Fig. 2共a兲. All of the three training groups’ scores
improved significantly from the pretest to the post-test:
12.3% improvement 共means= from 70.4 to 82.7%兲 for the
fast-only training group 关p ⬍ 0.001兴; 15.5% improvement
共69.7 to 85.2%兲 for the slow-only training group 关p
⬍ 0.001兴; and 15.2% improvement 共70.2 to 85.4%兲 for the
slow-fast training group 关p ⬍ 0.001兴. In contrast, the control
group did not significantly improve their slow rate scores
共2.4% improvement from 69.4 to 71.8%兲 关p = 0.230兴. Thus,
3842
J. Acoust. Soc. Am., Vol. 121, No. 6, June 2007
significant effects of training were found for all three trained
groups. It is interesting to note that the trained groups improved on the slow rate test scores regardless of the type of
training they had received.
The results of the normal rate test scores 关Fig. 2共b兲兴 were
similar to those of the slow rate scores. All of the three
training groups improved their scores significantly from the
pretest to the post-test. A 10.9% improvement 共means
= from 63.4 to 74.3%兲 was made by the fast-only training
group 关p ⬍ 0.001兴, 7.4% improvement 共65.5 to 72.9%兲 by the
slow-only training group 关p = 0.007兴, and 8.0% improvement
共64.4 to 72.4%兲 by the slow-fast training group 关p = 0.001兴.
In contrast, the control group’s 4.9% improvement 共66.2 to
71.1%兲 was not significant 关p = 0.072兴. Thus, effects of training on the normal rate test scores were found for all three
trained groups. Compared to the slow rate test scores, however, the amount of improvement was less for the normal
rate. It is noteworthy that the trained groups improved on the
normal rate scores even though none of them had received
normal rate materials in training.
The results of the fast rate test scores 关Fig. 2共c兲兴 differed
from those of the other rates. The fast-only training group,
rather unexpectedly, did not significantly improve on the fast
rate test scores 共1.6% improvement from 62.6 to 64.2%兲 关p
= 0.524兴, as was the case for the control group 共5.6% improvement from 61.5 to 67.1%兲 关p = 0.061兴. In contrast, the
slow-only training group made a significant 4.0% improvement 共64.7 to 68.7%兲 关p = 0.046兴, and finally, the 5.5% improvement made by the slow-fast training group 共61.9 to
67.4%兲 was statistically robust 关p = 0.013兴.
In summary, regarding question 共4兲 from the introduction, we found that the perceptual ability of NE speakers
generalized from trained rate共s兲 to tested rates. For example,
slow-fast training showed the strongest effects not only on
the participants’ overall scores, but also on all of the three
rate test scores separately, including the normal rate which
was not used in training. Slow-only training improved participants’ performance most prominently on the slow and
normal rate test stimuli, and much less but still significantly
Hirata et al.: Japanese vowel length in varied rates
on the fast rate stimuli, even though their overall, cross-rate
scores did not show a clear effect of training as a whole.
Least effective was fast-only training where an overall effect
of training was not found. Examining each rate separately,
effects of fast-only training were found in the slow and normal rate test scores, but not in the fast rate scores.
IV. DISCUSSION AND CONCLUSIONS
This study examined the extent to which training with
sentences of different rates helped native English speakers to
learn to perceive Japanese vowel length contrast in the sentence context. One major finding, related to our first question, was that the speaking rate of sentences was a crucial
factor in participants’ perceptual abilities. The participants’
test scores were lower for faster rates, and the amount of
improvement shown by trained participants was also smaller
for faster rates. These results are consistent with previous
findings by Tajima et al. 共2005兲 and Hirata 共2005a and
2005b兲, and suggest that even if non-native speakers are able
to perceive L2 contrasts at a slow rate, this does not guarantee the same ability at faster rates.
The second major finding, related to our second question, was that non-native speakers can learn to perceive L2
contrasts given only sentence input, supporting Hirata
共2004b兲. The present slow-fast training, in particular, which
provided only sentences but no words in isolation, was found
to improve the identification of Japanese vowel length contrast in the sentence context. The overall improvement of the
slow-fast training group was significantly greater than that of
the control group, which did not participate in training.
Slow-only and fast-only training also showed some limited
effects 共discussed further, below兲. This study provides additional evidence that adults are capable of learning difficult
L2 contrasts with laboratory training, and not only with syllables or words in isolation 共Jamieson and Morosan, 1986;
Logan et al., 1991; Bradlow et al., 1997; Tajima et al., 2002;
Wang et al., 1999兲, but also with sentences 共Hirata, 2004b
and 2004c兲.
It should be noted, however, that the overall effect of
training in the present study was fairly small, as indicated by
the improvement of 9.1 percentage points for slow-fast training. This is less improvement than that shown in some previous studies of L2 contrast training that used a twoalternative forced choice identification method. For example,
Bradlow et al.’s 共1997兲 subjects showed a significant improvement of 16 percentage points in distinguishing English
/ [ / and /1/. 共The mean pretest score in Bradlow et al. 共1997兲
was 65%, comparable to the pretest score of 65.8% in the
present study.兲 On the lower end, Logan et al.’s 共1991兲 subjects showed a significant improvement of 7.8 percentage
points 共though their mean pretest score was higher: 78.1%兲.
The present study’s low level of improvement might partially
be due to the fact that the test stimuli were embedded in
sentences. This makes sense given that trained participants
consistently showed less improvement in the sentence than
word context in Hirata 共2004b兲. However, the crucial role of
sentence training cannot be overlooked. Hirata 共2004b兲
found that participants who were trained with sentences
3843
J. Acoust. Soc. Am., Vol. 121, No. 6, June 2007
made robust perceptual improvement in both the isolatedword and sentence contexts, but those who were trained only
with isolated words made less improvement in the sentence
context. Thus, even though training with sentences is more
time consuming, there is a practical advantage to using sentences in training, if our ultimate goal includes enabling nonnative speakers to perceive difficult L2 contrasts in fluent
speech. Following Hirata 共2004b兲, it would be interesting to
examine whether the present sentence training would improve participants’ ability to perceive isolated words as well
as words in sentences, as in the present experiment.
The third major finding of the present study relates to
the differential effects of the three types of training, as addressed in our third question. The participants who received
fast-only training improved least on their overall, cross-rate
test scores. Their overall improvement 共7.8 percentage
points兲 was not significantly greater than that of the control
group 共4.3 percentage points兲. Slow-fast training was slightly
more beneficial than slow-only training. The overall, crossrate improvement made by the slow-fast training group 共9.1
percentage points兲 was significantly greater than that made
by the control group, but the slow-only training group’s improvement 共8.6 percentage points兲 was only marginally
greater than the control group’s. In summary, only the slowfast training group improved significantly on the total test
scores, as well as on the scores for each of the three rates.
This result, that training with two rates was more effective
than training with one rate 共slow-only or fast-only兲, supports
Pisoni and Lively’s 共1995兲 high phonetic variability hypothesis. Their original hypothesis concerned the effectiveness of
training with variability in talker and phonetic context. The
present study provides support for the effectiveness of training with variability in speaking rate: the more variability in
the rates of training stimuli, the more effective for non-native
speakers’ perceptual learning. That slow-only training was
slightly advantageous over fast-only training might also be
consistent with the high variability hypothesis: variability of
vowel as well as sentence duration in the present training
was higher for the slow than fast rate 共see Hirata, 2004a兲.
This claim, however, should be examined in the future by
comparing effects of high-variability fast rate training versus
low-variability slow rate training.
The above conclusions in support of Pisoni and Lively’s
共1995兲 high variability hypothesis merit further discussion.
First, the difference among the three types of training was
quite small 共Fig. 1兲. Second, regardless of the type of training, participants showed a similar amount of improvement
for the slow and normal rates 共Fig. 2兲. Note that no participant heard normal-rate sentences in training, but that all
trained participants improved their test scores for the normal
rate. These results, relevant to our fourth question, imply that
even beginning L2 learners are able to normalize different
speaking rates fairly easily within the range of slow to normal speaking rates. The ability to normalize speaking rate in
native languages has been examined extensively, as reviewed
in the introduction, and this ability seems to be utilized for
L2 speech within a limited range of speaking rates even at an
early stage of L2 learning. This result is in line with Kato et
al.’s 共2004兲 finding that, with perceptual training, NE speakHirata et al.: Japanese vowel length in varied rates
ers were able to learn to adapt their categorical boundary
positions of Japanese vowel length contrast according to different contexts.
Also worthy of discussion are the results of fast-only
training, which perhaps are counterintuitive. Previous studies
suggested that abilities acquired during L2 training are
largely to deal with the kinds of stimuli that had been presented in training 共e.g., synthetic versus natural speech, word
versus sentence stimuli兲 共Strange and Dittmann, 1984; Morosan and Jamieson, 1989; Hirata, 2004c兲. This would predict that fast-only training should improve scores for the fast
rate, but not the other rates. However, the opposite results
were found: the fast-only training participants improved on
slow and normal stimuli, but not on fast stimuli. In addition,
this training was found to be least effective. Jamieson and
Morosan’s 共1986兲 perceptual fading technique, which successfully enabled Canadian francophones to identify the English /ð/-/␪/ contrast, is useful in understanding these results.
Their method initially reduces stimulus uncertainty by providing the longest consonant frication in the /ð#/-/␪#/ syllable continuum, and as training progresses, increases stimulus uncertainty by including syllables with progressively
shorter frication. This method helps learners to initially focus
their attention on critically relevant cues, and then to gradually learn to handle within-category acoustic variation. It is
possible, then, that participants in the present fast-only training suffered from not having first been provided with critically relevant cues with little uncertainty. The durational difference between short and long vowels is about 50 ms for the
fast rate, compared to about 150 ms for the slow rate 共Hirata,
2004a兲 and, thus, fast-only training participants might have
been unable to grasp the most critical cue, which is vowel
duration, solely from the fast stimuli.2
Another possible interpretation of the ineffectiveness of
fast-only training as compared to slow-only training might
concern short-term memory effects. It is possible that attentional resources available in short-term memory are limited
when learning to perceive unfamiliar non-native contrasts,
and that there is a higher demand in processing fast than
slow rate sentences. If so, then because of this higher demand, the fast rate speech signals might have degraded more
quickly than the slow rate speech signals in short-term
memory, and this might be why participants with fast-only
training were unable to learn the critical acoustic cue of
vowel duration as effectively. We expected from Pisoni and
Lively’s 共1995兲 high phonetic variability hypothesis that
single rate training, whether fast-only or slow-only, would
have the same effect on overall perceptual learning. However, the differential results of fast-only and slow-only training cannot be explained so simply by this hypothesis in regards to the issue of speaking rate variability. The present
results may suggest that different processing and memory
demands for slow versus fast rate stimuli are at play, and the
amount and quality of information transmitted from shortterm memory to long-term memory do matter. This factor
may augment Pisoni and Lively’s 共1995兲 high phonetic variability hypothesis in understanding how temporal characteristics of stimuli play a role in the ultimate formation of an L2
perceptual category.
3844
J. Acoust. Soc. Am., Vol. 121, No. 6, June 2007
On a final note, in order to achieve more robust perceptual learning, we might wish to train subjects on sentences
spoken at a greater number of speaking rates. Introducing
several more speakers would also create additional, possibly
useful, speaking rate variation. A related issue is the distribution of stimuli among slow, normal, and fast rates. The
speaking rates of the present test materials were 239, 136,
and 105 ms/ mora for slow, normal, and fast rates, respectively, showing a greater difference between the slow and
normal rates than the difference between the normal and fast
rates. It is possible that differential effects of training might
appear more strongly if fast rate materials are even faster or
if three speaking rates are distributed more equally. A future
study should also examine whether non-native listeners benefit from randomizing rates at every trial instead of by block,
as the former has been found to provide more difficulty than
the latter 共Tajima et al., 2005兲. Finally, the carrier sentence in
the present training was held constant for the purpose of
examining the precise effect of speaking rate, but a future
study should also examine whether providing different carrier sentences would benefit learners. Although additional research is thus necessary, the present study reveals that sentential speaking rate is an important factor in L2 learners’
acquisition of duration-based distinctions.
ACKNOWLEDGMENTS
This study was supported by National Science Foundation Grant No. BCS0418246, given to Y.H. We thank Jon
Bernard for his assistance in creating the training program
and editing the manuscript, and Jacob Whiton, Connor
Forbes, Lucia Livak, Jess Worby, Carol Glenn, Stephen
Lambacher, and Spencer Kelly for their help in various
stages of this project. We also thank reviewers for their insightful comments on the earlier version of this manuscript.
See Hirata 共2004a兲 for more details of duration measurements of these
training materials. A subset of the materials was also used for NJ speakers’
perception in Hirata and Lambacher 共2004兲. In that study, NJ speakers
identified short and long vowels of these materials with 98.1% accuracy.
2
Here, we used differences in absolute vowel duration as an example of
speaking rate differences, but what the precise measure of variability
should ideally be is an empirical question. One could argue that relational
measures, such as ratios of short and long vowels, might be more appropriate than the measure of absolute duration. The present experiment
showed generalizability of learning from one rate to another, which might
suggest that relational measures are more suitable as a measure of variability. However, other evidence suggests that absolute duration plays an important role for non-native listeners 共Hirata, 2005a; Tajima, 2006兲, as well
as for native listeners when stimuli are impoverished 共Hirata and Lambacher, 2004兲.
1
Bradlow, A. R., Pisoni, D. B., Yamada, R. A., and Tohkura, Y. 共1997兲.
“Training Japanese listeners to identify English /r/-/l/: IV. Some effects of
perceptual learning on speech production,” J. Acoust. Soc. Am. 101,
2299–2310.
Denes, P. 共1955兲. “Effect of duration on the perception of voicing,” J.
Acoust. Soc. Am. 27, 761–764.
Francis, A. L., and Nusbaum, H. C. 共2002兲. “Selective attention and the
acquisition of new phonetic categories,” J. Exp. Psychol. Hum. Percept.
Perform. 28, 349–366.
Fujisaki, H., Nakamura, K., and Imoto, T. 共1975兲. “Auditory perception of
duration of speech and non-speech stimuli,” in Auditory analysis and perception of speech, edited by G. Fant and M. A. A. Tatham 共Academic
Press, London兲, pp. 197–219.
Hirata et al.: Japanese vowel length in varied rates
Gay, T. 共1981兲. “Mechanisms in the control of speech rate,” Phonetica 38,
148–158.
Gay, T., and Hirose, H. 共1973兲. “Effect of speaking rate on labial consonant
production,” Phonetica 27, 44–56.
Gay, T., Ushijima, T., Hirose, H., and Cooper, F. S. 共1974兲. “Effect of
speaking rate on labial consonant-vowel articulation,” J. Phonetics 2, 47–
63.
Gottfried, T. L., Miller, J. L., and Payton, P. E. 共1990兲. “Effect of speaking
rate on the perception of vowels,” Phonetica 47, 155–172.
Han, M. 共1992兲. “The timing control of geminate and single stop consonants
in Japanese: A challenge for nonnative speakers,” Phonetica 49, 102–127.
Hirata, Y. 共1990a兲. “Perception of geminated stops in Japanese word and
sentence levels,” The Bulletin of the Phonetic Society of Japan 194, 23–
28.
Hirata, Y. 共1990b兲. “Perception of geminated stops in Japanese word and
sentence levels by English-speaking learners of Japanese language,” The
Bulletin of the Phonetic Society of Japan 195, 4–10.
Hirata, Y. 共2004a兲. “Effects of speaking rate on the vowel length distinction
in Japanese,” J. Phonetics 32, 565–589.
Hirata, Y. 共2004b兲. “Training native English speakers to perceive Japanese
length contrasts in word versus sentence contexts,” J. Acoust. Soc. Am.
116, 2384–2394.
Hirata, Y. 共2004c兲. “Computer assisted pronunciation training for native
English speakers learning Japanese pitch and durational contrasts,” Comp.
Assist. Lang. Learning 17, 357–376.
Hirata, Y. 共2005a兲. “Effects of speaking rate on native English speakers’
identification of Japanese vowel length,” ISCA Workshop on Plasticity in
Speech Perception, A36.
Hirata, Y. 共2005b兲. “Factors affecting native English speakers’ identification
of Japanese vowel length,” First ASA Workshop on Second Language
Speech Learning, 13.
Hirata, Y., and Lambacher, S. G. 共2004兲. “Role of word-external contexts in
native speakers’ identification of vowel length in Japanese,” Phonetica 61,
177–200.
Hirata, Y., and Tsukada, K. 共2004兲. “The effects of speaking rates and vowel
length on formant movements in Japanese,” in Proceedings of the 2003
Texas Linguistics Society Conference: Coarticulation in Speech Production and Perception, edited by A. Agwuele, W. Warren, and S. H. Park
共Cascadilla Proceedings Project, Somerville, MA兲, pp. 73–85.
Jamieson, D. G., and Morosan, D. E. 共1986兲. “Training non-native speech
contrasts in adults: Acquisition of the English /ð/-/␪/ contrast by francophones,” Percept. Psychophys. 40, 205–215.
Johnson, T. L., and Strange, W. 共1982兲. “Perceptual constancy of vowels in
rapid speech,” J. Acoust. Soc. Am. 72, 1761–1770.
Kato, H., Tajima, K., Rothwell, A., Akahane-Yamada, R., and Munhall, K.
共2004兲. “Perception of phonemic length contrasts in Japanese with or
without a carrier sentence by native and non-native listeners,” Proceedings
of the 18th International Congress on Acoustics, I-609-612.
Kato, H., Wilson, A., Tajima, K., and Akahane-Yamada, R. 共2005兲. “Effects
of training conditions on the learning of phonemic length contrasts in
Japanese: Speaking rate and presentation context,” Nihon Onkyoo Gakkai
Kooen Ronbunshuu, 851–852.
Kawai, G., and Hirose, K. 共2000兲. “Teaching the pronunciation of Japanese
double-mora phonemes using speech recognition technology,” Speech
Commun. 30, 131–143.
Kondo, Y. 共1995兲. “Production of schwa by Japanese speakers of English: A
crosslinguistic study of coarticulatory strategies,” doctoral dissertation
共University of Edinburgh, UK兲.
Landahl, K., and Ziolkowski, M. 共1995兲. “Discovering phonetic units: Is a
picture worth a thousand words?” Papers from the 31st Regional Meeting
of the Chicago Linguistic Society 1, 294–316.
Landahl, K., Ziolkowski, M., Usami, M., and Tunnock, B. 共1992兲. “Interactive articulation: Improving accent through visual feedback,” in The Proceedings of the Second International Conference on Foreign Language
Education and Technology, edited by I. Shinjo et al. 共The Language Laboratory Association of Japan, Kasugai, Aichi, Japan兲, pp. 283–292.
Logan, J. S., Lively, S. E., and Pisoni, D. B. 共1991兲. “Training Japanese
listeners to identify English /r/ and /1/: A first report,” J. Acoust. Soc. Am.
89, 874–886.
Magen, H. S., and Blumstein, S. E. 共1993兲. “Effects of speaking rate on the
vowel length distinction in Korean,” J. Phonetics 21, 387–409.
Miller, J. L. 共1987兲. “Rate-dependent processing in speech perception,” in
3845
J. Acoust. Soc. Am., Vol. 121, No. 6, June 2007
Progress in the Psychology of Language, Vol. 3, edited by A. Ellis 共Erlbaum Associates, Hillsdale兲, pp. 119–157.
Miller, J. L., and Liberman, A. M. 共1979兲. “Some effects of later-occurring
information on the perception of stop consonant and semivowel,” Percept.
Psychophys. 25, 457–465.
Morosan, D., and Jamieson, D. G. 共1989兲. “Evaluation of a technique for
training new speech contrasts: Generalization across voices, but not wordposition or task,” J. Speech Hear. Res. 32, 501–511.
Newman, R. S., and Sawusch, J. R. 共1996兲. “Perceptual normalization for
speaking rate: Effects of temporal distance,” Percept. Psychophys. 58,
540–560.
Oguma, R. 共2000兲. “Perception of Japanese long vowels and short vowels
by English speaking learners,” Japanese-Language Education Around the
Globe 10, 43–55.
Pickett, E. R., Blumstein, S. E., and Burton, M. W. 共1999兲. “Effects of
speaking rate on the singleton/geminate consonant contrast in Italian,”
Phonetica 56, 135–157.
Pisoni, D. B. 共1997兲. “Some thoughts on “normalization” in speech perception,” in Talker Variability in Speech Processing, edited by K. Johnson and
J. W. Mullennix 共Academic Press, New York兲, pp. 9–32.
Pisoni, D. B., and Lively, S. E. 共1995兲. “Variability and invariance in speech
perception: A new look at some old problems in perceptual learning,” in
Speech Perception and Linguistic Experience: Issues in Cross-Language
Speech Research, edited by W. Strange 共York, Timonium, MD兲, pp. 433–
459.
Port, R. F. 共1977兲. The Influence of Speaking Tempo on the Duration of
Stressed Vowel and Medial Stop in English Trochee Words 共Indiana University Linguistics Club, Bloomington, IN兲, pp. 1-77.
Port, R. F., and Dalby, J. 共1982兲. “Consonant/vowel ratio as a cue for voicing in English,” Percept. Psychophys. 32, 141–152.
Sawusch, J. R., and Newman, R. S. 共2000兲. “Perceptual normalization for
speaking rate II: Effects of signal discontinuities,” Percept. Psychophys.
62, 285–300.
Strange, W., and Dittmann, S. 共1984兲. “Effects of discrimination training on
the perception of /r-1/ by Japanese adults learning English,” Percept. Psychophys. 36, 131–145.
Studebaker, G. A. 共1985兲. “A “rationalized” arcsine transform,” J. Speech
Hear. Res. 28, 455–462.
Tajima, K. 共2006兲. “Perceiving prosodic units in a second language: A crosslanguage comparison,” International Workshop on the Auditory Processing of Prosodic Features and its Applications: Acquisition of Linguistic
Organization and Human Audition, 40.
Tajima, K., Rothwell, A., and Munhall, K. G. 共2002兲. “Native and nonnative perception of phonemic length contrasts in Japanese: Effect of identification training and exposure,” J. Acoust. Soc. Am. 112, 2387.
Tajima, K., Kato, H., Yamada, R., Rothwell, A., and Munhall, K. G. 共2005兲.
“Native and non-native perception of phonemic length contrasts in Japanese: Effect of speaking rate,” Nihon Ninchi-Shinri Gakkai Dai 3-kai Taikai Happyoo Ronbunshuu, 69.
Toda, T. 共1997兲. “Strategies for producing mora timing by non-native speakers of Japanese,” in Acquisition of Japanese as a Second Language, edited
by Daini Gengo Shuutoku Kenkyuukai 共Bonjinsha, Tokyo兲, pp. 157–197.
Tsukada, K. 共1999兲. “An acoustic phonetic analysis of Japanese-accented
English,” doctoral dissertation 共Macquarie University兲.
Ueyama, M. 共2000兲. “Prosodic transfer: An acoustic study of L2 English vs
L2 Japanese,” doctoral dissertation 共University of California at Los Angeles兲.
Vance, T. J. 共1987兲. An Introduction to Japanese Phonology 共State University of New York Press, Albany, NY兲.
Verbrugge, R. R., and Shankweiler, D. 共1977兲. “Prosodic information for
vowel identity,” Haskins Laboratories Status Report on Speech Research
SR-51/52, 27–35.
Wang, Y., Spence, M. M., Jongman, A., and Sereno, J. A. 共1999兲. “Training
American listeners to perceive Mandarin tones,” J. Acoust. Soc. Am. 106,
3649–3658.
Wayland, S. C., Miller, J. L., and Volaitis, L. E. 共1994兲. “The influence of
sentential speaking rate on the internal structure of phonetic categories,” J.
Acoust. Soc. Am. 95, 2694–2701.
Yamada, T., Yamada, R. A., and Strange, W. 共1995兲. “Perceptual learning of
Japanese mora syllables by native speakers of American English: Effects
of training stimulus sets and initial states,” Proceedings of the 14th International Congress of Phonetic Sciences 1, 322–325.
Hirata et al.: Japanese vowel length in varied rates