Prosodic Transfer: An Acoustic Study of L2

UNIVERSITY OF CALIFORNIA
Los Angeles
Prosodic Transfer:
An Acoustic Study of L2 English vs. L2 Japanese
A dissertation submitted in partial satisfaction of the
requirements for the degree Doctor of Philosophy
in Applied Linguistics
by
Motoko Ueyama
2000
© Copyright by
Motoko Ueyama
2000
TABLE OF CONTENTS
Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.1. Language transfer: L1 background plays a role in second language learning . . 1
1.1.1. Overview of language transfer theory . . . . . . . . . . . . . . . . . . . . .1
1.1.2. Language transfer in L2 speech development . . . . . . . . . . . . . . . .3
1.2. Focus of the present study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1. Prosodic transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5
1.2.2. Bi-directional transfer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3. Current view of prosody . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7
1.4. Prosodic phenomena investigated in the present study . . . . . . . . . . . . . . . . . 9
1.4.1. Contrast between lexically accented and unaccented vowels . . . . . . 9
1.4.2. Contrast between English tense vs. lax vowels and between
Japanese long vs. short vowels . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.3. Temporal organization across syllables . . . . . . . . . . . . . . . . . . . 12
1.4.4. Minimal unit of prosodic segmentation at the word level . . . . . . . . 13
1.5. Major factors affecting the prosodic phenomena investigated . . . . . . . . . . . . 14
1.5.1. Intonation structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.5.2. Segment types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17
1.5.3. Phrase-final lengthening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.5.4. Foot size and phrase size . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19
1.6. Structure of the present study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Chapter 2: Word Accent Production in L2 English and L2 Japanese . . . . . . . . . . . . . . 21
2.1. Experiments 1 & 2: Word accent production in neutral declaratives . . . . . . . .21
2.1.1. Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.2. Word accent realization in L1 English vs. L1 Japanese . . . . . . . . . 21
2.1.3. Expected patterns in L2 Japanese and L2 English . . . . . . . . . . . . 22
2.1.4. Speech materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.1.5. Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26
2.1.6. Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.1.7. Results of Experiment 1 (English) . . . . . . . . . . . . . . . . . . . . . . 30
2.1.8. Results of Experiment 2 (Japanese) . . . . . . . . . . . . . . . . . . . . . 36
2.1.9. Discussion of Experiments 1 and 2 . . . . . . . . . . . . . . . . . . . . . . 40
2.2. Experiment 3: Word accent production after focus in L2 English . . . . . . . . . 45
2.2.1. Prosodic context of the target word in Experiment 1:
Nuclear position . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.2.2. Possible learning strategy in L2 English–L1 Japanese . . . . . . . . . 45
2.2.3. Context of the target word in Experiment 3: Post-nuclear position . 46
2.2.4. Expected patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.2.5. Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .48
2.2.6. Speech materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.2.7. Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.2.8. Results of Experiment 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.2.9. Discussion of Experiment 3 . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.3. Summary of Experiments 1-3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
ii
Chapter 3: Vowel Contrast in L2 English and L2 Japanese . . . . . . . . . . . . . . . . . . . . 65
3.1. Vowel system of English and Japanese . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.2. Characteristics of vowel length contrast in English vs. Japanese . . . . .. . . . . 66
3.2.1. Prosodic unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.2.2. Phonetic duration contrast . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.2.3. Vowel quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .68
3.2.4. Duration vs. vowel quality in the production of vowel contrasts. . . 68
3.3. Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.3.1. L2 Japanese–L1 English . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.3.2. L2 English–L1 Japanese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.4. Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.5. Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.5.1. Speech materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.5.2. Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.5.3. Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.6. Results of Experiment 4 (English) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.6.1. Duration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.6.2. Vowel quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .80
3.7. Results of Experiment 5 (Japanese) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .85
3.7.1. Duration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.7.2. Vowel quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .89
3.8. Discussion of Experiments 4 and 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
3.8.1. Vowel contrast in L1 English vs. L1 Japanese . . . . . . . . . . . . . . 94
3.8.2. Duration contrast in L2 English and L2 Japanese . . . . . . . . . . . . 95
3.8.3. Quality contrast in L2 English and L2 Japanese . . . . . . . . . . . . . 100
3.9. Summary of Experiments 4 and 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Chapter 4: Temporal Organization Across Syllables in L2 English and L2 Japanese . . . 105
4.1. English stress vs. Japanese mora timings . . . . . . . . . . . . . . . . . . . . . . . . 105
4.1.1. Stress-foot and mora as basic timing units. . . . . . . . . . . . . . . . 105
4.4.2. Factors characterizing different timing types . . . . . . . . . . . . . . . 106
4.2. Linguistic factors investigated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.2.1. Effect of parts of speech on temporal organization. . . . . . . . . . 109
4.2.2. The lapse constraint and the culminativity requirement . . . . . . . . 110
4.3. Experiment 6 (English) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.3.1. Expected patterns in L2 English . . . . . . . . . . . . . . . . . . . . . . . 111
4.3.2. Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.3.3. Results of Experiment 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.3.4. Discussion of Experiment 6 . . . . . . . . . . . . . . . . . . . . . . . . . 120
4.4. Experiment 7 (Japanese) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
4.3.1. Expected patterns in L2 Japanese . . . . . . . . . . . . . . . . . . . . . . 122
4.4.2. Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
4.3.3. Results of Experiment 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.3.4. Discussion of Experiment 7 . . . . . . . . . . . . . . . . . . . . . . . . . 133
4.5. Summary of Experiments 6 and 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Chapter 5: Awareness of L2 Syllable Structures . . . . . . . . . . . . . . . . . . . . . . . . . . 143
iii
5.1. English and Japanese syllable structures . . . . . . . . . . . . . . . . . . . . . . . 143
5.2. Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
5.3. Method . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . 147
5.3.1. Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
5.3.2. Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
5.3.3. Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
5.4. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
5.4.1. L2 English segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
5.4.2. L2 Japanese segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
5.3.3. Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
5.5. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
5.5.1. L1 vs. L2 word segmentation . . . . . . . . . . . . . . . . . . . . . . . . 150
5.5.2. L2 Japanese segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
5.5.2. Word segmentation in beginning vs. advanced L2 speech . . . . . . 162
5.5.3. Connection between awareness of L2 syllable structures and L2
segmentation production . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
5.6. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
Chapter 6: Conclusion . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
iv
LIST OF FIGURES
Figure 2-1
Figure 2-2
Figure 2-3
Figure 2-4
Figure 2-5
Figure 2-6
Figure 2-7
Figure 2-8
Figure 2–9
Figure 2–10
Figure 2-11
Figure 2-12
Figure 2-13
Figure 2-14
Figure 2-15
F0 means & standard deviations of stressed vs. unstressed vowels
for L1 English . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
F0 means & standard deviations of stressed vs. unstressed vowels
for Speakers AE1 and BE3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Average F0 ratio of English stressed/unstressed vowels . . . . . . . . . . . . 33
Duration means and standard deviations of stressed and unstressed vowels
for Speaker NE1 (L1 English) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Average duration ratio of English stress/unstressed vowels . . . . . . . . . . 35
Average F0 ratio of Japanese accented/unaccented vowels . . . . . . . . . . 37
Duration means & standard deviations of accented vs. unaccented vowels
for L1 Japanese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Average duration ratio of Japanese accented/unaccented vowels . . . . . . . 40
Realization of nuclear pitch accent in English . . . . . . . . . . . . . . . . . . . 45
Realization of post-nuclear word stress in English . . . . . . . . . . . . . . . . 47
F0 means & standard deviations of stressed vs. unstressed vowels
for L1 English in post-nuclear position . . . . . . . . . . . . . . . . . . . . . . . .50
F0 means & standard deviations stressed vs. unstressed vowels
for advanced L2 English in post-nuclear position . . . . . . . . . . . . . . . . . 51
F0 means & standard deviations stressed vs. unstressed vowels
for beginning L2 English in post-nuclear position . . . . . . . . . . . . .. . . . 52
Duration means & standard deviations of stressed vs. unstressed vowels
for L1 English in post-nuclear position . . . . . . . . . . . . . . . . . . . . . . . .55
Duration means & standard deviations of stressed vs. unstressed vowel
for advanced L2 English in post-nuclear position . . . . . . . . . . . . . . . . 56
Figure 3-1 Vowel system of English and Japanese . . . . . . . . . . . . . . . . . . . . . . . . 65
Figure 3-2 Duration means and standard deviations of tense /i/ and lax /I/
for Speaker NE1 (L1 English) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Figure 3-3 Average durational ratio of English tense/lax vowels for Speaker NE1
(L1 English) . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Figure 3-4 Mean and standard deviation of duration ratio of English tense/lax vowels
for BE1, BE3 and AE1 (L2 English) . . . . . .
77
Figure 3-5 Mean and standard deviation of duration ratio of English tense/lax vowels
for AE2, AE3 and BE2 (L2 English) . . . . . . . . . . . . . . . . . . . . . . . . . 78
Figure 3-6 Average duration ratios of English tense/lax vowels . . . . . . . . . . . . . . . . 79
Figure 3-7 /i/ and /I/ in the vowel space of L1 English speakers . . . . . . . . . . . . . . . .81
Figure 3-8 /i/ and /I/ in the vowel space of beginning L2 English speakers . . . . . . . . 82
Figure 3-9 /i/ and /I/ in the vowel space of advanced L2 English speakers . . . . . . . . 83
Figure 3-10 Euclidean distance (c) between the tense /i/ token T and the lax /I/ token L . 84
Figure 3-11 Euclidean distance between English tense /i/ and lax /I/ . . . . .. . . . . . . . 85
Figure 3-12 Duration means and standard deviations of short and long vowels
for Speaker NJ1 (L1 Japanese) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Figure 3-13 Duration ratios of Japanese long/short vowels for Speaker NJ1
(L1 Japanese) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Figure 3-14 Average duration ratios of Japanese long/short vowels . . . . . . . . . . . . . 88
Figure 3-15 /i/ and /ii/ in the vowel space of L1 Japanese speakers . . . . . . . . . . . . . . 90
v
Figure 3-16 Spectral contrast in L1 Japanese vs. L1 English in the high front region . . .90
Figure 3-17 /i/ and /ii/ in the vowel space of AJ1, AJ2 and AJ3 (advanced L2 Japanese)
and NE1 (L1 English) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Figure 3-18 /i/ and /ii/ in the vowel space of beginning L2 Japanese speakers . . . . . . 92
Figure 3-19 Average Euclidean distance between Japanese long /ii/ to short /i/ . . . . . . . 93
Figure 3-20 Mean and standard deviation of Japanese long and short vowels
for AJ2, AJ3, BJ2 (L2 Japanese) and NJ1 (L1 Japanese) . . . . . . . . . . . . 98
Figure 3-21 Mean and standard deviation of Japanese long and short vowels
for AE2, AE3 (advanced L2 English) and NE1 (L1 English) . . . . . . . . . . 99
Figure 4-1
Figure 4-2
Figure 4-3a
Figure 4-4a
Figure 4-4b
Figure 4-5a
Figure 4-5b
Figure 4-6
Mean duration & standard deviation of unstressed /o/ for L1 English
speakers in Experiment 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Mean duration & standard deviation of unstressed /o/
for advanced speakers of L2 English in Experiment 6 . . . . . . . . . . . . . 119
Mean duration & standard deviation of vowel /o/ of 4-mora unaccented
sentence (okane-da) for L1 Japanese speakers . . . . . . . . . . . . . . . . . . 126
Mean duration & standard deviation of vowel /o/ of 4-mora unaccented
sentence (okane-da) for one L1 and three L2 Japanese speakers . . . . . . . 130
Mean duration & standard deviation of vowel /o/ of 5-mora unaccented
sentence (tomodatSi-da) for one L1 and three L2 Japanese speakers . . . . .131
Waveform and pitch contour of “tomodatSi-da” in BJ’s production . . . . 137
Waveform and pitch contour of “tomodatSi-da” in AJ2’s production . . . 137
Mean duration and standard deviation of the vowel /o/ of “tomodatSi-da”
for a L1 Japanese speaker (NJ4) and three L2 Japanese speakers . . . . . . 139
Figures 5-1 Syllable structures of English and Japanese in native speakers’
awareness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Figures 5-2 More examples of Japanese syllable structure . . . . . . . . . . . . . . . . . .
146
Figure 5-3 Average number of instances of the segmentation unit /no/
for English monosyllabic words . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Figure 5-4 Percentage of native-English-like patterns produced
by L1 English speakers and Japanese speakers of L2 English . . . . . . . . 153
Figure 5-5 Average number of occurrences of the segmentation unit /no/
as a function of the number of consonants in a syllable . . . . . . . . . . . . . 156
Figure 5-6 Total number of occurrences of the segmentation unit /no/
for Japanese words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .158
vi
LIST OF TABLES
Table 2-1
Table 2-2
Table 2-3
Table 2-4
Table 2-5
Table 2-6
Table 2-7
Table 2-8
Table 2-9
Table 2-10
Background information of L2 English speakers in Experiment 1 . . . . . . 26
Background information of L2 Japanese speakers in Experiment 2 . . . . . 28
ANOVA results for F0 data of L1 and L2 English in Experiment 1 . . . . . 32
ANOVA results for duration data of L1 English in Experiment 1 . . . . . . 35
ANOVA results for F0 data of L1 and L2 Japanese in Experiment 2 . . . . 37
ANOVA results for duration data of L1 Japanese in Experiment 2 . . . . . . 39
Summary of results of Experiment 1 (English) . . . . . . . . . . . . . . . . . . . . . 41
Summary of results of Experiment 2 (Japanese) . . . . . . . . . . . . . . . . . . . . 41
F0 contrast in post-nuclear position . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Durational contrast in nuclear position . . . . . . . . . . . . . . . . . . . . . . .
.
57
Table 2-11 Durational contrast in post-nuclear position . . . . . . . . . . . . . . . . . . . .
57
Table 2-11 Durational contrast in post-nuclear position . . . . . . . . . . . . . . . . . . . .
57
Table 2-12 Manipulation of F0 and duration in nuclear vs. post-nuclear . . . . . . . . . .58
Table 2-13 F0 and duration (D.) contrasts in nuclear position for advanced
L2 English . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Table 2-14a F0 and duration (D.) contrasts in post-nuclear position for advanced
L2 English . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Table 2-14b F0 and duration contrasts in post-nuclear position for advanced L2
English . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Table 3-1
Table 3-2
Table 3-3
Table 3-4
Table 3-5
Table 3-6
Table 4-1
Table 4-2
Table 4-3
Table 4-4
Table 4-5
Table 4-6
Table 4-7
Table 4-8
Table 4-9
English and Japanese vowel contrasts . . . . . . . . . . . . . . . . . . . . . . . 66
ANOVA results for duration data of L1 English in Experiment 4 . . . . . . 76
ANOVA results for duration data of L1 Japanese in Experiment 5 . . . . . 86
Summary of L1 English and L1 Japanese patterns observed in
Experiments 4 and 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Summary of duration contrast in L2 English and L2 Japanese vowels
observed in Experiments 4 and 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . .95
Summary of quality contrast in L2 Japanese and L2 English vowels
observed in Experiments 4 and 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Percentages of CV and V syllable types in English, Spanish and
Japanese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Background information of L2 English speakers in Experiment 6
114
Test sentences in Experiment 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
ANOVA results for L1 English speakers . . . . . . . . . . . . . . . . . . . . . 118
ANOVA results for advanced speakers of L2 English
120
ANOVA results for phrase-medial moras in 4-mora and 5-mora sentences
produced by L1 Japanese speakers . . . . . . . . . . . . . . . . . . . . . . . . . .128
Grouping of mora positions by L1 Japanese speakers . . . . . . . . . . . . . 128
Accent patterns in L1 and L2 Japanese . . . . . . . . . . . . . . . . . . . . . . . 129
ANOVA results for phrase-medial moras in 4-mora and 5-mora
vii
Table 4-10
Table 4-11
Table 5-1
Table 5-2
Table 5-3
Table 5-4
Table 5-5
Table 5-6
Table 5-7
Table 5-8
Table 5-9
Table 5-10
sentences produced by three L2 Japanese speakers . . . . . . . . . . . . . . . 132
Grouping of mora positions by one L1 and three L2 Japanese speakers . 133
Accent patterns in L1 and L2 Japanese . . . . . . . . . . . . . . . . . . . . . . . 135
Syllable structure in English and Japanese . . . . . . . . . . . . . . . . . . . . 143
44 monosyllabic words used in the English phonological experiment . . .
148
24 words used in the Japanese phonological experiment . . . . . . . . . . . 148
Number of /no/ (representing the segmentation of English words
by L1 English speakers and Japanese speakers of L2 English) . . . . . . . 150
Results of judgments on the segmentation of English words whose
syllable structures are also possible in L1 Japanese . . . . . . . . . . . . . . . 154
Number of /no/ (representing the segmentation of Japanese words
by L1 Japanese speakers and English speakers of L2 Japanese) . . . . . . 157
Expected native patterns in the segmentation of Japanese words . . . . . . 158
Number of /no/ (representing the segmentation of Japanese words
containing short vs. long vowels) . . . . . . . . . . . . . . . . . . . . . . . . . 164
Average duration ratios of Japanese long/short vowels
(based on the results of Experiment 4) . . . . . . . . . . . . . . . . . . . . . . . 165
Average duration ratios of Japanese long/short vowels . . . . . . . . . . . . 165
viii
ACKNOWLEDGMENTS
This dissertation was finished thanks to the advice, encouragement and support of many
people. First and foremost, great thanks go to my advisor, Sun-Ah Jun. Since I knocked
to the door of her office six years ago, Sun-Ah has been encouraging, enthusiastic and
supportive. She not only guided me towards the right direction at every stage of my
dissertation, but also provided me with hope and inspiration, which kept me going on this
long and bumpy road. I would like to also thank Patricia Keating, Marianne Celce-Murcia
and Terry Au, who served in my dissertation committee. I owe special thanks to Patricia
Keating. Pat taught me the basics of experimental phonetics, gave me generous help when
I was working on my MA thesis and the proposal of this dissertation, invited me to work
as a regular member in the UCLA Phonetics Lab, and gave me valid professional and
academic opportunities. I thank Marianne Celce-Murcia for her long-term help and warm
care. Finally, I thank Terry Au for her helpful comments and also for giving me the
opportunity to take part in her developmental psychology project. I also thank Bruce
Hayes, who guided me with great patience as the advisor of my MA thesis, which became
a starting point of my dissertation work. More importantly, I learned from him to jump off
the cliff with courage.
I thank all the speakers for their patience in participating in the experiments. They also
volunteered insightful comments and observations from the language learner’s point of
view. I also thank Tetsuya Sano, who generously offered his time to find and organize
subjects for me in Tokyo, and Hideo and Nariko Imamura, who, with great generosity,
hosted me at their place during my data collection in Tokyo.
I would like to thank the Department of Applied Linguistics, the Department of
Linguistics, and the Rotary Foundation for supporting my education financially.
ix
I
specially thank the Department of Applied Linguistics for providing me with a dissertation
year fellowship.
It was a precious experience to spend many years working in the UCLA Phonetics Lab.
I thank Peter Ladefoged, Ian Maddieson and Donca Steriade for their generosity and
helpful comments on my work. Being together with fellow students in the lab made my
long marathon comfortable and joyful: special thanks go to Adam Albright, Mary Baltazani,
Taehong Cho, Matt Gordon, Testuo Harada, Kim Thomas, and Jie Zhang. I also thank
Henry Tehrani for his technical help and for fun chats in his office.
I owe thanks to more people for their input, which helped to broaden the scope of this
study: Jennifer Venditti for providing insightful comments as a front runner of Japanese
phonetics and helpful observations as a L2 Japanese speaker, and also for her friendship;
Kikuo Maekawa and Yoshinori Sagisaka for encouragement and inspiring discussions;
Masayoshi Shibatani and Keiichi Tajima for helpful comments; Robert Port for
encouragement; Takayuki Arai, Tina Cambier-Langeveld, Nick Campbell, Susan Guion,
Tetsuo Harada, Yukari Hirata, Yuko Kondo, Haruo Kubozono, Kanae Nishi, Takashi
Otake, Yoshinori Sagisaka, Yoshiho Shibuya, Teruhisa Uchida and Natasha Warner for
letting me share their work.
I thank my friends I met in Los Angeles for sharing a lot of joyful times (often over
good food). I specially thank Adam Albright, Ivano Caponigro, Taehong and Hye-Jeong
Cho, Cathryn Donohue, Barry Griner, Chai-Shune Hsu, Paul Iverson, Sahyang Kim,
Leah Knightly, Yuri Kusuyama, Emi Moria, Yasuyo Sawaki, Shigeko Sekine and Yoshiko
Tomiyama (“Tokko”). I also would like to thank Federica, Roberto, Stefano Cracolici and
Kazumasa (“Terra chan”) for their long-distance friendship. Thanks to Alan Mar and Dan
Tauber for thier long-term friendship, which saved me at some critical moments of my life.
I also thank Cecile Fougeron for her never-changing sisterhood, and Stefano Vegnaduzzo
for sharing a number of laughers and blues.
x
I thank my American parents, Jack and Elspeth Collins, and my Italian parents, Bruno
and Carla Baroni, for their affection, sense of humor and intellectual curiosity. I owe great
thanks to my family in Japan --- my father, mother, brother and sister --- for their longterm support and encouragement, and also for understanding me, who has been wandering
away from home for many years.
Finally, I thank my husband, Marco Baroni, who has been always next to me to share
good times and bad times, to put up with my never-changing difficulty with English articles
and prepositions, and also to enjoy many wonderful things in life.
xi
VITA
August 23, 1964
Born, Hiroshima City, Japan
August 1987
B.A., Education
University of Tottori, Tottori, Japan
1993-1994
Teaching Assistant,
Department of East Asian Languages and Cultures,
University of California, Los Angeles,
Los Angeles, California
June 1995
M.A., Teaching English as a Second Language.
(advisors: Profs. Bruce Hayes & Marianne Celce-Murcia)
Department of Applied Linguistics/TESL,
University of California, Los Angeles,
Los Angeles, California
1995-1999
Teaching Assistant,
Department of Linguistics,
University of California, Los Angeles,
Los Angeles, California
PUBLICATIONS
Ueyama, M. (1999). Durational reduction in L2 English produced by Japanese speakers.
Proceedings of the 14th International Congress of Phonetics Sciences.
Ueyama, M. (1999). An experimental study of vowel duration in phrase-final contexts in
Japanese. UCLA Working Papers in Phonetics 97.
Ueyama, M. & S.-A. Jun. (1998). Focus realization in Japanese English and Korean
English intonation. In Hajime Hoji (ed.), Japanese and Korean Linguistics 7. CSLI,
Stanford University Press.
Ueyama, M. (1997). Phonology and phonetics of L2 intonation: the case of Japanese
English. Proceedings of the 5th European Speech Conference.
Ueyama, M & S.-A. Jun. (1997). Focus realization in Japanese English and Korean
English intonation. UCLA Working Papers in Phonetics 94.
Ueyama, M. (1996). Phrase-final lengthening and stress-timed shortening effects in the
speech of native speakers and Japanese learners of English. Proceedings of the 4th
International Conference on Spoken Language Processing. (also in UCLA Working
Papers in Phonetics 92)
Ueyama, M. (1995). Phrase-Final Lengthening and Stress-Timed Shortening Effects in the
Speech of Native Speakers and Japanese learners of English. M.A. thesis, UCLA.
xii
ABSTRACT OF THE DISSERTATION
Prosodic Transfer:
An Acoustic Study of L2 English vs. L2 Japanese
by
Motoko Ueyama
Doctor of Philosophy in Applied Linguistics
University of California, Los Angeles, 2000
Professor Sun-Ah Jun, Chair
The effect of L1 characteristics on L2 speech has been investigated extensively at the
segmental level. This dissertation investigates how L1 prosodic features affect L2 prosodic
patterns in the production of the adult L2 speaker (i.e., prosodic transfer). Four speech
types were analyzed: 1) L2 English produced by L1 Japanese speakers; 2) L2 Japanese
produced by L1 English speakers; 3) L1 English; 4) L1 Japanese. This comparison is
interesting, since Japanese and English are typologically very different in terms of their
prosodic properties: e.g., English has stress accents, while Japanese has pitch accents;
Japanese has a phonemic length contrast, while English does not; English is a stress-timed
language, while Japanese is a mora-timed language.
Seven phonetic experiments were conducted to investigate three prosodic phenomena:
1) the contrast between lexically accented and unaccented vowels (Experiments 1-3); 2) the
contrast between English tense vs. lax vowels and between Japanese short vs. long vowels
(Experiments 4 and 5); 3) the temporal organization across syllables (Experiments 6 and 7).
xiii
Prosody is phonetically realized by multiple acoustic correlates, which differ from language
to language. In the analysis of the collected data, various correlates relevant to each of the
three prosodic phenomena were analyzed, taking both phonological and phonetic aspects
into account.
Additionally, a survey testing phonological awareness of L2 syllable
structure was conducted. The results of the survey were analyzed together with the results
of the phonetic experiments.
The results supported the following generalizations: first, the transfer patterns of L1
prosodic features in L2 prosody can vary greatly from correlate to correlate. Second,
different transfer patterns in the learner’s production can be explained by a difference
between L1 and L2 in terms of the phonological status of a relevant prosodic feature.
Third, there is a systematic interaction between the prosodic and segmental levels in the
transfer of L1 features in L2 speech development. Finally, an L2 speaker’s prosodic
system does not necessarily develop in a parallel manner for different dimensions of
prosody.
xiv
Chapter 1: Introduction
1.1. Language transfer: L1 background plays a major role in second
language learning
1.1.1. Overview of language transfer theory
It is a well known fact that when adult speakers learn to speak a foreign or second language
(L2, henceforth) their pronunciation is commonly foreign-accented. One of the major
factors causing an accent is the effect of first language (L1, henceforth) characteristics on
L2 patterns: i.e., language transfer.
The term transfer, as extensively used in the first half of the 20th century, refers to “the
psychological process whereby prior learning is carried over into a new learning situation.
The main claim with regard to transfer is that the learning of task A will affect the
subsequent learning of task B” (Gass & Selinker 1994, p. 54).
In the context of language learning, the oldest definition of transfer considers transfer
as the carryover of prior linguistic knowledge to an L2 context. If this is true, it is expected
that any differences between the L1 and the L2 will create difficulties in L2 learning. This
idea was generalized later (in the period from the 1940s to the 1960s) as the Contrastive
Analysis Hypothesis (or the strong form of language transfer theory). The hypothesis was
linked to behaviorist learning theory, in which it is assumed “that language is habit and that
language learning involves the establishment of a new set of habits” (Gass & Selinker
1994, p. 60). The central notion of the Contrastive Analysis Hypothesis is stated by the
two advocates of the hypothesis, Lado (1945) and Weinreich (1953):
1
[T]hose elements that are similar to his native language will be simple for
him [the learner], and those elements that are different will be difficult.
(Lado 1975, p.2)
The greater the difference between the two systems, i.e., the more
numerous the mutually exclusive forms and patterns in each, the greater is
the learning problem and the potential area of interference...
(Weinreich 1953, p.1)
The major claim of the Contrastive Analysis Hypothesis is that all L2 errors can be
predicted by identifying differences between L1 and L2 forms and patterns. Systematic L1
effects on L2 learning have been studied by assuming that L2 linguistic patterns can be
largely predicted on the basis of L1 characteristics, which transfer to L2 either positively or
negatively. Positive transfer takes place when L1 habits facilitate L2 learning, while
negative transfer occurs when L1 linguistic characteristics interfere with L2 learning.
Contrastive analysis provides a way of comparing the phonological, morphological and
syntactic systems of two languages. In contrastive studies, the following procedure is
commonly followed in order to predict L2 errors:
(1)
description (i.e., a formal description of the two languages is made)
(2) selection (i.e., certain items, which may be entire subsystems, such as
the auxiliary system, are selected for comparison)
(3)
comparison (i.e., the identification of areas of difference and
similarity)
(4)
prediction (i.e., identifying which areas are likely to cause errors)
(Ellis 1985, pp. 25-26)
2
The Contrastive Analysis Hypothesis became subject to empirical tests from the end of
1960s, and a number of counterexamples to it were provided. Two main criticisms were
presented: “[n]ot all actually occurring errors were predicted [by this hypothesis]; not all
predicted errors occurred” (Gass & Selinker 1995, p. 65). The first criticism was based on
the finding that many L2 errors are not attributable to L1 patterns, and that there can be
similarities between L2 patterns produced by speakers of different L1s.
For example,
Gilbert & Orlovic (1975) found that no article appears in the oral discourse of beginning L2
German speakers regardless of whether their L1 has articles or not. The second criticism
was based on the observation that some errors predicted by differences between the L1 and
the L2 do not occur, as shown in Kleinmann’s (1977) study.
He found that the
progressive aspect, which is absent in Arabic, was learned early and well by native Arabic
speakers learning English.
He suggested, on the basis of this finding, that when
something in L2 is very different from L1, there is a “novelty effect” which facilitates
learning L2 patterns. A similar case was also reported in Best’s (1999) study of L2
perception: new phonemes that are perceptually salient, e.g., Zulu clicks, can be identified
correctly by non-native adult speakers.
A number of empirical results along these lines lead to the approach that has been
widely accepted in the field of applied linguistics for the past 10 or 15 years, in which
language transfer is considered not as a mechanical carryover of L1 structures, but as a
cognitive mechanism that underlies L2 acquisition. In the modified view, the L2 learner’s
language is perceived not as a variation on L1 but as an autonomous linguistic system that
dynamically changes under the influence of multiple factors including, but not limited to,
characteristics of the learner’s L1.
1.1.2. Language transfer in L2 speech development
3
Although a large body of research on L2 acquisition has been conducted in the syntactic
and morphological domains, there has also been a strong research interest in identifying L1
characteristics in ‘foreign accents’. In the 1950s, contrastive analyses of the production of
L2 phonemes were conducted within the theoretical framework of structuralism (e.g.,
Weinreich 1953; Haugen 1956). The goal of such analyses was to find how L1 phonemes
would interfere with L2 phonemes as produced by the language learner.
However, it soon became apparent that the sole consideration of phonemic categories
was inadequate to achieve a comprehensive understanding of how L1 characteristics affect
the patterns of the L2 learner’s speech. Further research on L2 speech learning “indicated
that the classic transfer hypothesis is an oversimplification, and pointed out the need for
detailed phonetic investigations that are not subject to the ab initio data reduction of
phoneme-based description” (Leather and James 1996, p. 278).
For example, Brière
(1968) trained native speakers of English to produce French, Arabic and Vietnamese
phonemes varying in similarity to English phonemes. Results showed that a contrastive
analysis of the phoneme inventories of L1 and L2 could not predict the observed patterns.
Rather, the relative difficulty of learning different sounds could be explained by taking the
phonetic level into account (see Leather and James 1996 for a review).
Further evidence for language transfer at the phonetic level was presented in Flege’s
(1987) study. He investigated a case in which L2 learners do not have to learn new
segmental contrasts since L1 and L2 have similar phones. In this context, the Contrastive
Analysis Hypothesis predicts that L2 learners will not have any difficulties, since
difficulties should only arise from differences between the L1 and the L2 in phonemic
inventories.
However, Flege found that L2 phones similar to L1 phones are not
necessarily easy to acquire, since even experienced L2 speakers often retain L1 phonetic
habits in their production of L2 phones. For example, /t/ is found in both French and
English, but it is produced with a short-lag VOT (voice-onset-time) and dental place of
4
articulation in French, and with a long-lag VOT and alveolar place of articulation in
English. Flege’s (1987) data showed that experienced French speakers of L2 English
produced English /t/ using phonetic characteristics of L1 French, and vice versa for the
production of French /t/ in L2 French by L1 English speakers.
The aforementioned studies indicate that two sources of L1 speech need to be
considered for a better understanding of the general effect of the L1 sound system on L2
speech development: the L1 phonological system, and the L1 phonetic patterns surfacing in
the realization of the phonological system.
More aspects of L2 speech and more
combinations of the learner’s L1 and L2 need yet to be investigated from phonological and
phonetic perspectives.
1.2. Focus of the present study
1.2.1. Prosodic transfer
The systematic effect of L1 characteristics on L2 speech has been investigated extensively
at the segmental level (for reviews, see Flege 1987, 1995; Leather & James 1996). For a
valid assessment of how L1 characteristics affect L2 speech patterns, it is also necessary to
investigate the suprasegmental or prosodic aspect of language transfer phenomena. I will
call the effect of L1 prosodic characteristics on the L2 speech system prosodic transfer.
Prosody widely refers to the phonological organization of individual sounds (i.e.,
segments) into higher-level constituents and also to the pattern of relative prominence
within these constituents, which is cued by variation of F0, duration, amplitude and
segment quality (adapted from Shattuck-Hufnagel & Turk 1996). For example, intonation
and timing patterns are considered prosodic phenomena.
There have been some
instrumental studies on the production of L2 prosody, and in particular L2 intonation (e.g.,
Gårding 1981 for L2 French–L1 Swedish and Greek; Todaka 1990 for L2 English–L1
Japanese; Argyres 1996 for L2 English–L1 Greek; Ueyama & Jun 1998 for L2 English–L1
5
Korean or Japanese; Jun & Oh 2000 for L2 Korean–L1 English), and L2 timing (e.g.,
Levitt 1992 for L2 French–L1 English; Mochizuki-Sudo & Kiritani 1992, Ueyama 1995,
Shibuya 1997 for L2 English–L1 Japanese; Anderson-Hsieh & Venkatagiri 1994 for L2
English–L1 Chinese; Uchida 1996 for L2 Japanese–L1 Chinese). Each of these studies
examined a single prosodic phenomenon. We believe that the general mechanism of
prosodic transfer can be better understood by considering multiple prosodic phenomena
within the same study. The present study investigates three prosodic phenomena in L2
speech production: 1) the contrast between lexically accented and unaccented vowels; 2) the
contrast between English tense vs. lax vowels and between Japanese short vs. long
vowels; 3) the temporal organization across syllables.
1.2.2. Bi-directional transfer
Most past studies of L2 speech development examined one direction of language transfer
(i.e., either transfer from language A to language B or transfer from language B to
language A). Very few studies of L2 speech production have examined the two directions
of language transfer within the same study (from language A to language B and from
language B to language A). At the segmental level, Flege’s (1987) study on the production
of /t/, mentioned earlier in this chapter, examined both L2 French produced by native
English speakers and L2 English produced by native French speakers. However, as far as
I know, no earlier study has investigated both directions of language transfer at the
prosodic level of L2 speech production. The analysis of both directions of transfer (e.g.,
L1 Japanese to L2 English and L1 English to L2 Japanese) is expected to be more
informative than the analysis of only one direction in investigating prosodic transfer.
The present study will instrumentally investigate prosodic transfer in two types of L2
adult speech: L2 English produced by native speakers of Tokyo Japanese and L2 Japanese
produced by native speakers of American English (henceforth, L2 English–L1 Japanese
6
and L2 Japanese–L1 English, respectively). A comparison of these two L2 types should
be interesting, since Japanese and English are very different in terms of their prosodic
characteristics. This comparison of prosodic patterns in the two L2 types appears to be
feasible, since the prosodic characteristics of L1 Japanese and L1 English are relatively well
described under the same framework (relevant references are discussed later in this
chapter).
1.3. Current view of prosody
Over the past 20 years, prosodic theory has been extensively developed. In the 1960s and
1970s, it was commonly assumed in generative grammar and psycholinguistic studies that
the structural constituents of spoken utterances of a sentence correspond to those predicted
by the syntax. Many studies showed that major acoustic phonetic phenomena such as
intonational boundaries, preboundary lengthening or pausing, tend to occur at major
syntactic boundaries (e.g., Klatt 1975).
However, the examination of large corpora of utterances actually produced by speakers
showed notable discrepancies with the results predicted by syntax. Although prosodic
structure is largely determined by syntactic structure, it is found that the two structures are
not isomorphic (see Shattuck-Hufnagel & Turk 1996, Fougeron 1999 for reviews; and Gee
& Grojean 1983, Ferreira 1992, Jun 1993, 1998 among others for experimental evidence).
Furthermore, it was found that extra-syntactic factors influence the constituency of spoken
utterances. Research in the 1980s and 1990s provided evidence for the claim that
“speakers make active use of prosodic elements in the production of spoken utterances, and
that systematic variations in the phonetic realization of phonemic segments and features
depends at least in part on prosodic structure” (Shattuck-Hufnagel & Turk 1996, p. 225;
also see Jun 1993; Fougeron & Keating 1998; Fougeron 1999; Keating et al. [to appear]).
7
Currently, the existence of prosodic structure as an independent component of language is
widely recognized.
Prosody can be analyzed at two levels:
•
At the level of physical realization in the speech signal, in terms of acoustic
patterns of F0, duration, amplitude, spectral tilt, and segmental reduction,
and their articulatory correlates.
•
And at the level of its utterance-structuring function (which determines its
physical realization). Prosodic structure is “the organization structure of
speech” and “a complex grammatical structure that must be parsed in its
own right” (Beckman 1996). This structure is organized in prosodic
constituents defined as “domains” in which particular prosodic phenomena
are realized. These phenomena are considered as prosodic because they do
not refer to segments, but to higher level constituents. In this sense, they
are suprasegmental phenomena.
(adapted from Shattuck-Hufnagel & Turk 1996 and Fougeron 1999)
The realization of prosodic structure can be analyzed in terms of two main types of
organization of continuous speech: tonal organization (i.e., intonational structure) and
temporal organization (i.e., rhythmic structure). Both types of organization are acoustically
realized as suprasegmental phenomena, and they can be analyzed as the variation of
physical correlates of speech. Tonal organization is acoustically realized as F0 variation,
and temporal organization is realized as variation in duration. Both types of prosodic
organization are largely determined by phonological categories, such as the position of
word accent or stress (e.g., in English, lexically stressed syllables are longer in duration
and higher in F0 than unstressed syllables) or by the distribution of segments (e.g.,
8
Japanese phonemically long segments are at least two times longer than short segments,
and in Japanese poetry, long segments are counted as two beats, while short segments
count as one beat).
As languages are different in terms of their segmental inventories, they also differ in
terms of their prosodic properties. This, of course, applies to English and Japanese. The
present study specifically investigates the effect of L1 prosodic characteristics on L2
prosodic patterns in the production of two L2 types, i.e., L2 English–L1 Japanese and L2
Japanese–L1 English.
The investigation of the study focuses on the following three
aspects of prosody: 1) the contrast between lexically accented and unaccented vowels; 2)
the contrast between English tense vs. lax vowels and between Japanese short vs. long
vowels; 3) the temporal organization across syllables. English and Japanese are different in
all these three aspects. First, English has a stress accent, while Japanese has a pitch accent.
Second, Japanese has a phonemic length contrast, while English does not. Third, English
is a stress-timed language, while Japanese is a mora-timed language. In the rest of this
chapter, English and Japanese will be compared with respect to these three aspects and
other important factors affecting prosodic structure.
1.4. Prosodic phenomena investigated in the present study
1.4.1. Contrast between lexically accented and unaccented vowels
Although they both possess lexically specified accents, English and Japanese differ in
terms of how they manipulate the three acoustic correlates of accent: F0, duration and
intensity. English word accent (i.e., stress) is cued by the combination of all three main
correlates1 : a significant change in F02 , longer duration and greater amplitude (Fry 1955;
1
Another correlate of accent on which much less research has been conducted is vowel quality (Lehiste
1970). This correlate will not be included in our analysis of the production of L2 word accent.
2
Whether accent is associated with an increase or decrease of F0 depends on the types of pitch accents
associated with stressed syllables. See the discussion about intonation in the section on control factors.
9
Liberman 1960; Lea 1977; Nakatani, et al. 1982; Beckman 1986). In contrast, Japanese
accented and unaccented syllables are reliably distinguished only by fundamental frequency
(F0) differences, and not by duration nor intensity (Han 1969; Weitzman 1969; Hoequist
1983a, 1983b; Sugito 1982a, 1990; Beckman 1986). Following Beckman (1986), if we
define ‘stress’ as lexically specified prominence involving all three physical correlates, we
can say that English lexical accents are stress accents, while Japanese lexical accents are
non-stress accents.
The acoustic difference between Japanese and English accents is paralleled by
perceptual differences. Perception studies of English stress accents (Fry 1958; Lea 1977;
Nakatani & Aston 1978; Beckman 1986) essentially agree on the point that “the listener
relies on differences in (1) the length of the syllables, (2) the loudness of the syllables and
(3) the pitch of the syllables... (Fry 1958, pp. 127-128).” On the other hand, Japanese
speakers rely on pitch as the main cue in order to distinguish accented syllables from
unaccented syllables (Beckman 1986).
These studies have shown that the three prosodic correlates (F0, duration and intensity)
are treated differently in Japanese and English in the production and perception of lexically
accented and unaccented syllables. It is thus interesting to see how the two systems interact
with each other in L2 speech development. The first three experiments of the present study
will concern the two L2 types with respect to the manipulation of F0 and duration in the
production of L2 word accent.
1.4.2.
Contrast between English tense vs. lax vowels and between
Japanese long vs. short vowels
In both English and Japanese, duration is used as an acoustic correlate of vowel contrast,
and the distribution of vowels greatly conditions rhythmic organization. In English, both
duration and vowel quality contribute significantly to the contrast between tense and lax
10
vowels (e.g., meat /mit/ vs. mitt /mIt/; pool /pul/ vs. pull /pUl/). Tense vowels are longer in
duration (e.g., /i/ is longer than /I/) and more peripheral in the vowel space (e.g., /i/ is more
peripheral than /I/) than lax vowels (Peterson & Lehiste 1960; Hillenbrand et al. 1995).
Japanese displays a phonemic contrast between short and long consonants and vowels
(e.g., /ka'tta/ ‘won’ vs. /ka'ta/ ‘shoulder’; /too'ru/ ‘pass’ vs. /to'ru/ ‘catch’). Such a
contrast can be characterized in both phonological and phonetic terms. Short and long
segments are phonologically categorized by L1 Japanese speakers as monomoraic and
bimoraic, respectively (the mora is an abstract timing unit of Japanese, as discussed in the
following section), and the phonetic durations of short and long segments are
systematically differentiated. However, unlike for English tense and lax vowels, there is
no significant difference in vowel quality between Japanese short and long vowels (e.g., /i/
and /ii/).
The comparison of vowel contrasts in English and Japanese shows that they greatly
differ in terms of the phonemic status and phonetic treatment of duration and vowel quality,
the two acoustic correlates of vowel contrast. For duration, while the Japanese contrasts
are characterized in both phonological and phonetic terms, the English contrasts are only
phonetic but not phonological. In terms of vowel quality, English tense vowels are more
peripheral than their lax counterparts in similar regions of the vowel space. In that sense,
we can say that vowel quality plays a phonemic role in the production of English contrasts.
However, unlike the case of English tense and lax vowels, there is no significant quality
difference between Japanese short and long vowels.
The consideration of both phonological and phonetic properties of vowel contrasts
leads us to ask the following two questions. How do L1 vowel contrasts transfer at the
phonetic level, given the difference between English and Japanese in terms of the
phonological status of vowel contrasts? The goal of Experiments 4 and 5 is to answer
11
these questions by investigating durational and spectral patterns of vowel contrasts in the
production of L2 English–L1 Japanese and L2 Japanese–L1 English.
1.4.3. Temporal organization across syllables
Languages have been traditionally classified into different timing categories in terms of the
fundamental units of timing or duration (for a summary, see Beckman 1992; Tajima 1998).
English is classified as a stress-timed language, in which the fundamental isochronous unit
of timing is the stress foot (Pike 1945; Abercrombie 1967). French is classified as a
syllable-timed language in which the syllable is used as a basic timing unit.
Finally,
Japanese is classified as a mora-timed language having the mora as the basic timing unit
(Jinbo 1927; Bloch 1950; Warner & Arai [submitted] for a review).
However, many studies have later found that none of these timing units corresponds to
a constant isochronous interval in the acoustic signal. They are rather abstract units without
stable acoustic manifestations.
Despite the absence of acoustic evidence, it has been proposed in recent
psycholinguistic studies that the timing units play a crucial role in segmenting continuous
speech when speech input is processed. Evidence from studies of adult speech processing
(see Cutler 1996 for a review) suggests that English speakers do use the stress foot, French
speakers the syllable, and Japanese speakers the mora as preferred units of segmentation.
This indicates that the timing units are a part of grammar and psychologically real.
Previous studies also show that different timing types are characterized by other
properties of speech. For example, it has been pointed out that the distribution of different
types of syllable structures greatly affects the average durational difference between
stressed and unstressed syllables. In stress-timed languages, more complex syllable
structures tend to be found in stressed syllables, while simple structures (CV) occur in
unstressed syllables. This difference results in a higher average number of segments in the
12
stressed syllables, which in turn explains the higher average duration of the stressed
syllables (see Fant, Kruckenberg & Nord 1991). The inventory of syllable types is more
limited in syllable-timed languages like French, Italian or Spanish, and even more limited in
mora-timed languages like Japanese. It has been found that differences between L1 and L2
syllable structure cause problems in L2 speech production. For example, when some L2
syllables are not legitimate in L1 syllable structures (typically, they have a more complex
structure than what is allowed in L1), the beginning language learner phonologically
reorganizes L2 syllables to adapt them for possible L1 syllable structures (see Browselow
1983 and Browselow & Park 1995 for case studies of epenthesis errors in L2 production).
This reorganization of the structures of L2 syllables impacts L2 temporal organization.
Crosslinguistic differences in timing patterns are also affected by the treatment of
duration in lexical accent realization.
In English, stressed syllables are longer than
unstressed syllables, and consequently the durational patterns of utterances are strongly
affected by the distribution of lexical stress. On the other hand, in Japanese, lexical accent
does not affect the duration of syllables.
Consequently, the durational patterns of
utterances are independent of the distribution of lexical stress, but rather dependent on the
distribution of phonemic short and long segments. In other words, in English, lexical
accent properties and rhythmic organization are closely related. In Japanese, these two
aspects are independent, and rhythmic organization largely depends on the phonemic
distribution of short and long segments.
Considering all these factors affecting temporal organization, Experiments 6 and 7
investigate the effect of L1 prosodic characteristics on temporal organization in L2
English–L1 Japanese and L2 Japanese–L1 English.
1.4.4 Minimal unit of prosodic segmentation at the word level
13
English and Japanese also differ in terms of the timing units used in segmenting continuous
speech, as discussed earlier: the stress foot in English and the mora in Japanese. In the
case of word segmentation, native speakers of English use the syllable as the minimal
segmentation unit, while native speakers of Japanese use the mora.
For example, the
English word corn is not further segmented by native speakers of English, since this word
is monosyllabic. The same word is borrowed and lexicalized as koon in Japanese. By
employing the mora as the minimal segmentation unit, koon is further segmented into 3
moras (ko.o.n) by native Japanese speakers (note that the coda nasal is also counted as an
independent mora).
The difference between the two languages in terms of which
phonological unit is employed as the minimal segmentation unit raises an important
question: Do L2 speakers become aware of L2 prosodic word structure? To answer this
question, a phonological survey was additionally conducted in the present study.
1.5. Major factors affecting the prosodic phenomena investigated
In order to investigate prosodic transfer in L2 English and L2 Japanese, the present study
examines three acoustic correlates of prosody, i.e., F0, duration and vowel quality. It is
well known that the variation of these acoustic correlates in prosodic realization is
influenced by both segmental effects and other prosodic factors. Previous studies report
that phonetic realization of the physical correlates of suprasegmental features (e.g., F0,
duration, intensity or vowel quality) is highly affected by multiple linguistic factors. In the
seven phonetic experiments of the present study, the following major factors are controlled
in order to strictly examine the three prosodic phenomena of interest: 1) intonational
structure; 2) segment types; 3) phrase-final lengthening; 4) foot size and phrase size.
1.5.1. Intonational structure
14
As reviewed in the last section, the lexical specification of accents is phonetically realized
by language-specific manipulations of the prosodic correlates: for English stress accents,
this means a change in F0, longer duration and greater amplitude; for Japanese non-stress
accents, there is only a change in F0. At the phrase and sentence level, in both languages,
the values of the prosodic correlates can vary in different intonation patterns. This point
will be explained for the case of English, adopting the phonological model of English
intonation proposed by Pierrehumbert and her colleagues (Pierrehumbert 1980; Beckman &
Pierrehumbert 1986).
This model of English intonation can be summarized as follows (the summary is
adapted from Ueyama & Jun 1998). Continuous F0 contours are analyzed as sequences of
underlying H and L tones. These tones are categorized as one of three types: pitch accents;
phrasal tones (or accents); boundary tones. The pitch accent is associated with the stressed
syllable of the phrase, and by this association, the stressed syllable of a certain word
receives pitch prominence. The boundary tone indicates the end of an Intonational Phrase
(IP), which is the highest level of English prosodic organization. Finally, the phrasal tone
indicates the end of an intermediate phrase, which is the second highest level below an IP,
and it covers the space between the last pitch accent and the boundary tone in the IP. In
English, there are six types of pitch accents (H*, L*, H+L*, H*+L, L+H*, L*+H), two
types of phrasal tones (H-, L-), and two types of boundary tones (L%, H%). These three
types of tones (pitch accent, phrasal tone, and boundary tone) are hierarchically organized,
reflecting the hierarchical organization of an utterance. In the intonational structure of
English, one IP has at least one pitch accent, one phrasal and one boundary tone (and
exactly one phrasal and boundary tone if the IP is also one intermediate phrase), and one IP
can have more than one intermediate phrase. Thus, the internal structure of an IP is formed
by a sequence of pitch accents and phrasal tones. If there is more than one pitch accent in
the intermediate phrase, the last pitch accent is the most prominent one in English, and it is
15
labeled as the nuclear pitch accent.
Intonation contours provide information about
phrasing, which varies depending on multiple factors such as the types and numbers of
pitch accents, syntactic structure, speech rate, and the presence/absence and locations of
pragmatic focus (see Jun 1993, Shattuck-Hufnagel & Turk 1996 for reviews).
Pirrehumbert’s model of English intonation has been applied to Japanese by Beckman
& Pierrehumbert (1986), Pierrehumbert & Beckman (1988) and Venditti (1995, 2000).
Japanese intonation patterns are also analyzed as sequences of H and L tones, and these
tones are categorized into pitch accents, phrasal tones and boundary tones. The Japanese
tone inventory is much smaller than the English one. According to The Japanese ToBI
Labeling Guidelines (Venditti 1995), which provide a system for transcribing Tokyo
Japanese, there is one type of pitch accent (H*+L), one type of phrasal tone (H-) and three
types of final boundary tones (L%, H%, HL%; LH% is also included in Venditti 2000).
As in English, there are two hierarchical levels of tonally defined phrases in Tokyo
Japanese: accentual phrases (AP) < intonational phrases (IP). In Tokyo Japanese, pitch
accents are associated with lexically accented syllables, and the intonation pattern of the AP
depends on the distribution of lexical accents. Some words are lexically accented, and
some are unaccented in Tokyo Japanese. Consequently, the AP containing an accented
word is accented, and the AP containing an unaccented word is unaccented. Consider the
contrast between the two accented forms kaki' (accent on the second mora) ‘fence-NOM’
and ka'ki (accent on the first mora) ‘oyster-NOM’ and the unaccented form kaki (no accent)
‘persimmon-NOM’:
Accented APs
H*+ L
|
kaki ‘fence’
kaki'-ga (lexical pitch accent specified on ki )
H*+L
|
kaki ‘oyster’
ka'ki-ga (lexical pitch accent specified on ka)
16
Unaccented AP
kaki ‘persimmon’
kaki-ga3 (no lexical pitch accent)
(adapted and modified from Shibuya 1997)
In this model, as summarized in Venditti (1995), words may group together into an AP
delimited by three tones: a H- phrasal tone and L% boundary tone. That is, the beginning
of an AP is a H- phrasal tone near the beginning of an AP, and a final L% boundary tone
marking the end. The H- phrasal tone is marked on both unaccented and accented APs, but
it is not marked when H- is indistinguishable from the high tone of the lexical accent
(H*+L) in an accented AP. The end of an AP is indicated by a L% boundary tone, which
also serves to mark a fall tonal movement at the end of the IP when the end of the IP is also
the end of an AP.
As in English, different sequences of tone types directly affect F0 patterns in Japanese.
For example, other things being equal, the accented AP has a higher F0 maximum than the
unaccented AP (Beckman & Pierrehumbert 1986; Pierrehumbert & Beckman 1988). Also,
different phrasing types affect the distribution of duration. Japanese tone types do not
affect duration values (Homma 1981; Beckman 1986; Sugito 1982a, 1996), probably due
to the strong immalleability of the mora duration. However, the IP-final syllable is often
subject to phrase-final lengthening; consequently, this syllable is realized with much longer
duration than other syllables within the same IP (see the following discussion on phrasefinal lengthening).
Since duration and F0 are influenced by the tonal organization of phrases in a languagespecific manner, intonation patterns will be matched across sentences and speakers in our
experiments.
1.5.2. Segment types
-ga is the nominative marker in Japanese.
3
17
Each segment has intrinsic prosodic characteristics (see Lehiste 1970, Beckman 1986 for
review). “Other things being equal, a higher vowel generally has a higher F0 than a lower
vowel, and the effect has been found in enough languages that there must be a physical
basis” (Beckman 1986, p. 129). Also, different vowels have different intrinsic durations.
“Other things being equal, a low vowel will be longer than a high vowel” (Beckman 1986,
p. 141; also see Lehiste 1970). This relationship between vowel length and vowel height
has been shown for typologically different languages including English (e.g., Peterson &
Lehiste 1960; Lehiste 1970; Umeda 1975; van Santen 1992) and Japanese (e.g. Han 1961;
Nishimura 1979). A similar relationship has been found for intrinsic amplitude: “other
things being equal, a low vowel has higher peak overall intensity than does a high vowel”
(Beckman 1986, p. 142). This is also attested for various languages including English
(e.g., Lehiste & Peterson 1959) and Japanese (Homma, 1973). In addition, the acoustic
patterns of segments vary greatly depending on the types of adjacent segments (for F0, see
Beckman 1986, for duration, see van Santen 1992). Therefore, in the present study we
will control for intrinsic and neighboring segment effects on the target vowels.
1.5.3. Phrase-final lengthening
Many if not all languages display phrase-final lengthening at the ends of prosodic units at
some level. In English, it is reported that the amount of phrase-final lengthening reflects at
least four levels of the prosodic hierarchy (Wightman et al. 1992): prosodic word <
accentual phrase4 < intermediate phrase (ip) < intonational phrase (IP). In Japanese, a
similar type of correspondence is found, but which prosodic levels correspond to a certain
amount of phrase-final lengthening is still controversial.
For example, Beckman &
Pierrehumbert (1988) have found that both intermediate and intonational phrases are
4
The existence of an accentual phrase in English was originally hypothesized by Beckman and
Pierrehumbert (1986). However, a later study (Beckman & Edwards 1990) could not find any phonetic
18
marked by phrase-final lengthening, but they recognize that this finding should be
confirmed by perception tests. The results of Ueyama’s (1999) production experiment
show significant effects only at the end of the IP, but not at the end the intermediate phrase.
It is also reported that the mora at the end of a breath group5 is longer, in a number of
studies of Japanese duration patterns based on large-size speech corpora (Takeda et al.
1989; Kaiki et al. 1990; Campbell 1992; Kaiki & Sagisaka; Venditti & van Santen 1998).
It was also reported that phrase-final lengthening spreads regressively to non-final
syllables in Hebrew (Berkovits 1993) and Dutch (Cambier-Langeveld 2000). A similar
effect on the penultimate syllable of a phrase has been observed in English (Ueyama 1995),
so it seems that the distance between a target syllable and a boundary can affect duration at
least in the case of English.
In the present study, phrase-final lengthening and the
regressive spreading of this effect was controlled by not putting the target vowels at the
edges of prosodic units, and also by matching the number of segments and syllables
between the target syllables and the boundary.
1.5.4. Foot size and phrase size
As briefly discussed in earlier sections, it has been claimed that English stress-timing is
characterized by the constant duration of the stress-foot.
This implies that constituent
segments are more compressed as the foot size increases. This shortening effect is called
stress-timed shortening. Later studies have found that this does not generally occur (e.g.,
for experimental evidence, Nakatani et al. 1982, Ueyama 1995; for review, Lehiste 1977,
Kawasaki 1983), but it is possible that some speakers have some tendency toward this
pattern. Thus, in the present study the number of syllables between two stressed syllables
(i.e., interstress interval) was kept constant in the neighborhood of the target syllables.
correlates of this prosodic unit in English. In Wightman et al. (1992), this term refers to a grouping of
words within an intermediate phrase.
5
The breath group corresponds to the IP or the utterance.
19
The size of the prosodic units also affects the properties of prosodic organization. For
example, Jun (1993) says that prosodic phrasing is sensitive to multiple factors including
“heaviness of the phrase, that is, to the number of syllables within the phrase” (p. 180). If
the phrase size is too large, the phrase tends to break up into two smaller phrases. A
similar effect was also found in Japanese phrasing (Kubozono 1993; Maekawa 1994).
Phrase size also affects the duration of constituent segments. Earlier studies have reported
that in Japanese vowels are shorter as the length of phrases increases (e.g., Homma 1982;
Kaiki and Sagisaka 1992). Furthermore, Ueyama and Jun (1998) have shown that less
proficient L2 speakers tend to divide utterances into smaller phrases. So it is expected that
phrase size will influence both segmental duration and prosodic grouping of speech in both
L1 and L2 speech. For this reason, in the present study target syllables were embedded in
phrases of similar sizes.
1.6. Structure of the present study
In the present study, the data of seven production experiments are analyzed in order to
investigate language transfer in the three prosodic domains discussed earlier:
Investigated prosodic phenomena:
1) the contrast between lexically accented and unaccented vowels (Experiments 1–3)
2) the contrast between English tense vs. lax vowels and between Japanese short vs.
long vowels (Experiments 4 & 5)
3) temporal organization across syllables (Experiments 6 & 7)
Moreover, a phonological survey was organized to test whether speakers become aware
of the minimal segmentation unit in L2. The results of the phonological survey will be
analyzed together with the results of the production experiments.
20
Chapter 2:
Word Accent Production in L2 English and L2 Japanese
2.1. Experiments 1 & 2: Word accent production in neutral declaratives
2.1.1. Goal
Two experiments were conducted in order to investigate the effect of L1 phonetic habits on
L2 word accent production. Experiments 1 and 2 examined L2 English produced by native
speakers of Tokyo Japanese and L2 Japanese produced by native speakers of American
English, respectively, focusing on the acoustic patterns of word accent in neutral
declarative sentences.
2.1.2. Word accent realization in L1 English vs. L1 Japanese
Both English and Japanese have word accent whose position is specified at the lexical level:
stress for English (e.g., stress on the first syllable in MUsic and on the second syllable in
beLOW) and pitch accent for Japanese (e.g., pitch accent on the first syllable in hashi
‘chopsticks’ and on the second syllable in hashi ‘bridge’).
There are similarities and
differences between the two languages in terms of the manipulation of acoustic correlates in
word accent production (see Section 1.4.1. for a review). English stress and Japanese
pitch accent are similar in the sense that F0 is an essential correlate in word accent
production.
In both languages, lexically accented syllables show higher F0 than
unaccented syllables6 .
However, the two languages differ in terms of the manipulation of duration.
In
English, stressed syllables are longer than unstressed syllables (i.e., duration is an active
6
This is always true in Japanese, but true in English only if a stressed syllable receives no accent or does
not receive a post-lexical low tone accent (i.e., only if it receives a high tone accent), as discussed in
Section 1.5.1.
21
correlate in word stress production).
In contrast, in Japanese, there is no systematic
difference in duration between accented and unaccented syllables (i.e., duration is an inert
correlate in pitch accent production). However, remember that length is a phonemic feature
in Japanese: short and long segments contrast phonemically (oki ‘off shore’ vs. ooki ‘(last
name)’; kata ‘form’ vs. katta ‘bought’). Therefore, it can be said that duration is active
phonologically in Japanese even though not phonetically active in pitch accent production in
the sense that there is no significant durational difference between lexically accented and
unaccented syllables when the other factors are kept equal.
The manipulation of F0 and duration in English and Japanese for word accent
production in neutral declarative sentences can be summarized as follows (YES and NO
indicate the active and inactive roles of these two acoustic correlates in word accent
production, respectively):
F0
Duration
L1 English
YES
YES
L1 Japanese
YES
NO
2.1.3. Expected patterns in L2 Japanese and L2 English
F0 patterns
The aforementioned comparison of L1 English and L1 Japanese in terms of the
manipulation of F0 and duration leads us to expect the following L2 patterns at least in the
initial stages of the development of L2 Japanese–L1 English and L2 English–L1 Japanese.
The active role of F0 in L1 Japanese and L1 English probably facilitates the production of a
F0 contrast between lexically accented and unaccented syllables in L2 English and L2
Japanese, respectively, i.e., phonetic habits are expected to positively transfer to L2 word
production.
22
The effect of the active role of F0 in L1 Japanese was observed in previous studies of
L2 English–L1 Japanese both in production and perception (for production, Shibuya 1997;
for perception, see Beckman 1986, Watanabe 1987).
It was found that F0 plays a
dominant role in the production and perception of English stress by Japanese speakers,
probably due to the transfer of L1 Japanese features. As far as I know, there is no study of
L2 Japanese–L1 English from a production point of view, and only Beckman’s study
(1986) has investigated the case of perception.
Beckman compared monolingual
Americans with Americans who had lived in Japan for one year at least in the past. Her
results showed that exposure to authentic Japanese was positively correlated with the use of
F0 as the main cue to the perception of Japanese accents. The case of production will be
investigated in Experiment 2 of the present study.
Duration patterns
In order to learn the Japanese system, native speakers of English need to know how to
suppress duration, i.e., an active correlate in their L1 accent system, in their L2 Japanese
production. In contrast, native speakers of Japanese learning English need to learn how to
activate duration, i.e., an inert correlate in their L1 accent system.
The manipulation of F0 and duration in each L1 type and what needs to be learned to
produce native-like patterns in each L2 type can be summarized in the following way:
F0
Duration
L1 English
YES
YES
L1 Japanese
YES
NO
suppress
in L2 Japanese
activate
in L2 English
Ueyama’s (1996) study presented acoustic evidence of a positive correlation between the
duration ratio of stressed/unstressed vowels and L2 proficiency levels in L2 English-L1
23
Japanese: i.e., more experienced Japanese speakers of L2 English showed a greater
duration ratio of stressed/unstressed vowels.
We expect that Experiment 1 of the present
study will confirm this tendency in the production of English stress by Japanese speakers.
As already mentioned, Beckman (1986) examined the perception of Japanese accents
by Americans by assessing the relative perceptual salience of the four acoustic correlates of
word accents (F0, duration, amplitude and vowel quality), in comparison with the
perception of English stress by the same subjects. For L1 English, she found that all of the
four acoustic parameters showed significant effects in the perception of English stress7. If
we compare the use of the four acoustic correlates in L1 English and L2 Japanese–L1
English presented in Beckman’s data, we find that exposure to authentic Japanese tends to
be positively correlated with the suppression of the use of duration in the perception of
Japanese accent by English speakers. In Experiment 2, we will test whether a similar
correlation is observed in the production of Japanese accent by English speakers.
2.1.4. Speech materials
Experiment 1 (English)
In this experiment, four pairs of English nouns and verbs were compared: CONtract vs.
conTRACT; DIgest vs. diGEST; PERmit vs. perMIT; SUBject vs. subJECT. For each
pair, the noun form has word stress on the first syllable, while the verb form has word
stress on the second syllable, and the two forms are segmentally homophonous except for
the vowel quality of the first vowel (the first vowel of the verb form is typically reduced)8 .
One pair (PERmit vs. perMIT) has a reduced second syllable in the noun.
7
A context
This result agrees with the findings of Nakatani and Aston (1978).
The verb diGEST has two different pronunciations: [daIdZEst] and [dIdZEst]. All participants in
Experiment 1 produced this verb with the former pronunciation, so, for our purposes, the noun form and the
verb form of “digest” can be considered segmentally homophonous.
8
24
sentence and a frame sentence with the same target word were presented in order to elicit
the expected stress patterns:
Context:
Frame:
I read Reader’s Digest.
I said DIgest this time.
Context:
Frame:
Cows can digest grass.
I said diGEST this time.
In this context, we expected nuclear accent (i.e., sentence stress) on the stressed
syllable of the target word of the frame sentence. For each word pair, F0 and duration
of the first vowels were measured and statistically analyzed to compare patterns in
stressed vs. unstressed conditions:
stressed
e.g. DIgest
CONtract
vs.
unstressed
diGEST
conTRACT
Experiment 2 (Japanese)
In this experiment, 3 pairs of segmentally homophonous Japanese nouns were compared:
ki'ru ‘cut’ vs. kiru' ‘wear’; sa'too ‘(name)’ vs. sato'o ‘sugar’; to'Si ‘city’ vs. toSi'
‘age/year’. In each pair, the first form has a lexical pitch accent or word accent on the first
syllable, while the second form has a word accent on the second syllable. The target word
was presented in the following frame sentence:
Sosite
to iimasu.
(‘Next, I’ll say
.’)
A context sentence was not needed in the Japanese experiment, since different words are
spelled differently.
For each word pair, F0 and duration of the first vowels were measured and
statistically compared in the accented vs. unaccented condition:
25
accented
e.g. ki'ru
sa'too
vs.
unaccented
kiru'
sato'o
2.1.5. Subjects
Experiment 1 (English)
The set of speakers for Experiment 1 included one control group and two experimental
groups. The control group consisted of 4 native speakers of American English (NE1,
NE2, NE3 and NE4). They were all male speakers except NE3. They were all UCLA
undergraduate students at the time of recording. The two experimental groups consisted of
8 native speakers of Japanese learning English (L2 English speakers, henceforth): 4
speakers for each experimental group (AE1, AE2, AE3 and AE4 for the advanced learner
group; BE1, BE2, BE3 and BE4 for the beginning learner group). The background
information for all Japanese speakers of L2 English is summarized in Table 2-1.
Table 2-1: Background information of L2 English speakers in Experiment 1
age
gender
years of
residence
in the US
age of
arrival in
the US
age of
beginning
of English
instruction
duration of English
instruction
AE1
31
female
6 years
25
12
13 years 11 months
AE2
28
female
7 years
21
13
13 years
AE3
24
male
7 years
17
13
9 years 6 months
AE4
29
female
11 years
18
13
10 years 3 months
BE1
21
male
0
---
13
8 years
BE2
22
male
0
---
13
8 years
BE3
22
male
0
---
13
8 years
BE4
22
female
0
---
12
10 years 9 months
26
The criterion used to select speakers for the two L2 speakers groups was the number of the
years of residence in the United States.
The four advanced Japanese speakers of L2
English had been staying in the United States for more than 5 years, while four beginning
speakers had never stayed in English speaking countries for more than 3 months (in Table
2-1, zero indicates that the speaker resided in an English speaking country for less than 3
months). At the time of data collection, all four advanced speakers were UCLA students,
while all four beginning speakers were college students in Tokyo, Japan. AE3, BE1–3
were male speakers, and the others were females. All 8 Japanese speakers of L2 English
speak Tokyo dialect as their mother tongue9 .
Experiment 2 (Japanese)
The control group for Experiment 2 consisted of 4 native speakers of Japanese (Tokyo
dialect) who were college students in Tokyo, Japan (NJ1, NJ2, NJ3 and NJ4). They were
all male speakers except NJ4. The experimental groups consisted of 7 native speakers of
American English learning Japanese (L2 Japanese speakers, henceforth). As in Experiment
1, the number of the years of residence in the United States was used to group L2 Japanese
speakers into the advanced and beginning learner groups. The background information for
all seven speakers of L2 Japanese is summarized in Table 2-2.
The three advanced speakers of L2 Japanese (AJ1, AJ2 and AJ3) had been staying in
Japan for more than 4 years, while beginning speakers of L2 Japanese (BJ1, BJ2, BJ3
and BJ4) had never stayed in Japan more than 3 months (in Table 2-2, zero indicates that
the speaker’s duration in an English speaking country is less than 3 months). AJ1, AJ2,
BJ1, BJ2 and BJ4 were male speakers, while AJ3 and BJ3 were female speakers. All
9
It is important to keep the same dialectal background for Japanese speakers learning English, since
Japanese dialects greatly differ in terms of tonal patterns.
27
beginning speakers of L2 Japanese were undergraduates studying Japanese as a foreign
language at UCLA.
Table 2-2: Background information of L2 Japanese speakers in Experiment 2
age
gender
years of
residence in
Japan
age of
arrival in
Japan
age of beginning duration of
of Japanese
Japanese
instruction
instruction
AJ1 37
male
11 years
26
25
3 years 9 months
AJ2 26
male
4 years
20
18
9 years
AJ3 30
female
4 years
26
19
5 years
BJ1 22
male
0
---
19
2 years 6 months
BJ2 18
male
0
---
14
4 years 7 months
BJ3 24
female
0
---
14
6 years 6 months
BJ4 24
male
0
---
21
2 years 6 months
Differences in learning backgrounds between L2 English and L2 Japanese
Even though the proficiency groups in each L2 type were determined on the basis of the
same criterion (the years of residence), the learning background of the speakers of the two
L2 types is not matched in terms of two factors which may significantly affect their
performance in the experiments: the age at which they began to learn their L2, and the type
of instruction they received in L2.
All 8 Japanese speakers of L2 English in Experiment 1 started learning English at the
age of 12, continuing in junior and senior high schools. They went through curricula with
a strong emphasis on grammar and reading comprehension, having teachers whose native
language was Japanese, not English.
In contrast, five out of the seven L2 Japanese
speakers in Experiment 2 began to learn Japanese in college (except BJ2 and BJ3, who
started in junior high school). In addition to Japanese grammar, all seven L2 Japanese
speakers have been learning speaking and listening skills from teachers whose mother
28
tongue was Japanese. Thus, in summary, L2 English speakers in Experiment 1 began to
learn their L2 earlier than L2 Japanese speakers in Experiment 2 did; however, L2 English
speakers received less authentic input in their target language than L2 Japanese speakers.
It would be ideal to avoid these differences, but realistically, it is hard to perfectly
match the learning background of L2 English and L2 Japanese groups, since the social
status of the two target languages is different in Japan and the United States. Still, it is
important to be aware of the aforementioned differences when we later interpret the data
collected.
2.1.6. Procedure
Recording
For each experiment, sentences with target words were mixed with foil sentences with
words different from target words. Sentences in each reading of the list were pseudorandomized in 10 different orders. The first reading was not analyzed. In the recording
session, PsyScope was used to present sentences. One sentence was displayed on the
computer screen at a time.
The subjects were given sufficient time to practice the speech materials. They were
asked to read sentences without hesitations or pauses in the middle. Data were recorded in
the recording booth of the UCLA phonetics lab for L1 English, experienced L2 English,
and L2 Japanese groups, and in the recording room of Meiji Gakuin University
Information Center in Tokyo for L1 Japanese and inexperienced L2 English groups.
Measurements
Collected data were digitized with Kay Elemetrics’s CSL at a 10 kHz sampling rate.
Scicon’s PitchWorks was used to measure F0 and duration values.
analyzed if:
29
Tokens were not
•
there were errors in accent or stress placement on a target word
•
there were hesitations or pauses in the middle of the sentence
•
there were exceptional sequences of phonological tones
Duration was measured for the first vowel of a target word, using waveforms and
wide-band spectrograms. For the first syllable of contract and permit in Experiment 1, the
duration of the whole syllable rhyme (nucleus vowel + coda consonant) was measured
because of segmentation difficulties. F0 was measured at the center of the first syllable,
using a pitch extraction display.
Statistic Analysis
Obtained values of F0 and duration were analyzed using two-factor ANOVAs and
Scheffe’s post-hoc tests. The independent variables of the ANOVAs were word accent and
word pair. The focus of Experiments 1 and 2 is on the effect of word accent on F0 and
duration (stressed vs. unstressed conditions in Experiment 1; accented vs. unaccented
conditions in Experiment 2). The effect of word pair was included in the ANOVAs in
order to control for the variance generated by this factor.
2.1.7. Results of Experiment 1 (English)
F0 patterns
For each L1 English speaker, the F0 mean and standard deviation of stressed and
unstressed vowels are plotted in Figure 2-1. The results show that all four speakers of L1
English produced a greater F0 mean in the stressed condition than in the unstressed
condition for all four tested English word pairs.
30
Figure 2-1: F0 means & standard deviations
of stressed vs. unstressed vowels for L1 English
120
*
100
NE1
140
*
120
*
*
*
NE2
*
100
80
80
F0 (Hz)
60
60
40
40
20
20
0
0
contract
digest
240
*
200
per mit
*
contract
subject
digest
per mit
210
NE3
NE4
180
*
*
150
160
subject
*
*
120
120
90
80
60
40
30
0
0
contract
digest
per mit
subject
contract
stressed
unstressed
digest
per mit
subject
(* = significantly different at α = 0.01)
We conducted ANOVAs in order to investigate the effect of word accent (stressed vs.
unstressed target vowels) on the F0 values of the tested vowels. The results showed that
the effect of word accent was statistically significant for the data of every L1 English
speaker (p < 0.0001).
ANOVA results for each speaker are presented in Table 2-3.
Scheffe’s post-hoc tests were conducted to test differences in F0 means between the
stressed and unstressed vowel in each word pair.
Word pairs showing significant
differences are marked by an asterisk (α = 0.01) in Figure 2-1.
If we examine the
distribution of significant differences between stressed and unstressed means, we find two
patterns across the four speakers. NE2 and NE4 show significantly higher F0 in the
31
stressed condition for all 4 word pair types. NE1 and NE3 show this pattern for 2 out of 4
word pair types; however, stressed means are still greater in the two pairs showing no
significant difference, indicating a trend for a greater mean in the stressed condition. These
results suggest that in L1 English, overall, stressed vowels with a H* accent are
significantly higher in F0 than unstressed vowels.
Table 2-3: ANOVA results for F0 data of L1 and L2 English in Experiment 1
L1 English
Advanced
L2 English
Beginning
L2 English
NE1
F(1, 59) = 22.13
p = <.0001
AE1
F(1, 62) = 1090.42
p = <.0001
BE
1
F(1, 61) = 1319.37
p = <.0001
NE2
F(1, 47) = 110.37
p = <.0001
AE2
F(1, 62) = 1995.15
p = <.0001
BE
2
F(1, 51) = 208.03
p = <.0001
NE3
F(1, 42) = 27.69
p = <.0001
AE3
F(1, 59) = 101.01
p = <.0001
BE
3
F(1, 63) = 151.31
p = <.0001
NE4
F(1, 53) = 86.76
p = <.0001
AE4
F(1, 63) = 269.05
p = <.0001
BE
4
F(1, 57) = 581.60
p = <.0001
All Japanese speakers of L2 English showed consistently higher F0 for stressed vowels
in all 4 word pairs. In Figure 2-2, F0 means and standard deviations are shown for
representative speakers of the advanced and beginning L2 English groups (AE1 and BE3,
respectively). The results of a series of ANOVAs showed a significant effect of word
accent on the F0 values of the target vowels (p < 0.0001) in the F0 data of each L2 English
speaker. ANOVA results for each speaker are presented in Table 2-3. According to the
results of the Scheffe’s post-hoc tests (α = 0.01), the mean difference between stressed and
unstressed vowels was always statistically significant. Word pairs showing significant
differences are marked by an asterisk (α = 0.01) in Figure 2-2.
32
Figure 2-2: F0 means & standard deviations
of stressed vs. unstressed vowels for Speakers AE1 and BE3
280
F0 (Hz)
*
*
240
*
200
AE1
280
*
240
*
*
BE3
*
*
200
160
160
120
120
80
80
40
40
0
0
contract
digest
per mit
subject
contract
stressed
unstressed
digest
per mit
subject
(* = significantly different at α = 0.01)
Additionally, F0 ratios of stressed to unstressed vowels were computed for each
speaker’s data by dividing the F0 value of each stressed vowel token by the one of the
corresponding unstressed token for each repetition of each word pair type.
For each
speaker, obtained ratio values were pooled across all word pairs and repetitions, and mean
ratio and standard deviation was computed. Results are summarized in Figure 2-3.
Figure 2-3: Average F0 ratio of English stressed/unstressed vowels
2
L1 English
Advanced
L2 English
Beginning
L2 English
F0 ratio
1.5
1
0.5
0
NE1 NE2 NE3 NE4 AE1 AE2 AE3 AE4 BE1 BE2 BE3 BE4
33
Figure 2-3 shows that L1 English speakers overall show smaller F0 ratios than beginning
Japanese speakers of L2 English. We find three patterns in advanced L2 English data: AE3
shows a ratio in the range of L1 English ratios; AE4 shows a ratio in the range of ratios
produced by beginning L2 English; finally, AE1 and AE2 show the greatest ratios. On the
basis of these data, we can conclude that there is no systematic relation between L2
proficiency and the F0 ratios of stressed/unstressed vowels.
Duration patterns
All four L1 English speakers showed the same pattern: stressed vowels were
consistently longer than unstressed vowels in all four word pairs. An example of this
is shown in Figure 2-4, where means and standard deviations of stressed and
unstressed vowels of a representative speaker (NE1) are presented.
Figure 2-4: Duration means and standard deviations of stressed and
unstressed vowels for Speaker NE1 (L1 English)
(ms)
175
duration
150
stressed
*
125
unstressed
NE1
*
100
*
75
*
50
25
0
contract
digest
p ermit
subject
(* = significantly different at α = 0.01)
The results of a series of ANOVAs showed that the effect of word accent on vowel
duration was statistically significant for the data of every L1 English speaker (see Table 2-4
for ANOVA results for each speaker). According to the results of a series of Scheffe’s
34
post-hoc tests (α = 0.01), the difference in duration of the stressed vs. unstressed condition
was statistically significant for all 4 tested word pairs and all 4 L1 English speakers,
Table 2-4: ANOVA results for duration data of L1 English in Experiment 1
NE1
F(1, 61) = 156.15
p = <.0001
NE3
F(1, 53) = 406
p = <.0001
NE2
F(1, 59) = 470.19
p = <.0001
NE4
F(1, 54) = 241.17
p = <.0001
The durational contrast between stressed and unstressed vowels can be quantified by
computing duration ratios of stressed to unstressed vowels. For all speakers, the method
used to compute F0 ratios in Figure 2-3 was also used for duration ratios. Results are
summarized in Figure 2-5.
Figure 2-5: Average duration ratio of English stress/unstressed vowels
L1 English
durational ratio
3
Advanced
L2 English
Beginning
L2 English
2
stressed = unstressed
(no difference)
1
0
NE1 NE2 NE3 NE4 AE1 AE2 AE3 AE4 BE1 BE2 BE3 BE4
For the four speakers of L1 English, the mean ratios approximately range from 1.6 (NE1)
to 2.0 (NE2). In contrast, the ratios of beginning L2 English speakers are clustered around
1.0, which indicates that there is no durational difference between stressed and unstressed
vowels. Advanced L2 English speakers show ratio magnitudes somewhere in the middle
between L1 English and beginning L2 English.
35
These results suggest the following
generalization: advanced Japanese speakers of L2 English produce native-like duration
patterns more reliably than beginning speakers, showing greater duration ratios of
stressed/unstressed vowels.
The results show two points regarding the effect of transfer.
First, we found the
expected effect of negative transfer from L1 Japanese in that beginning Japanese learners of
English produced Japanese-like duration ratios of stressed/unstressed vowels.
Second,
there is a positive effect of learning, given that advanced Japanese learners show more
native-like ratio magnitudes.
2.1.8. Results of Experiment 2 (Japanese)
F0 patterns
Both L1 and L2 Japanese speakers showed consistently higher F0 for accented vowels in
all 3 tested word pairs. The results of a series of ANOVAs showed that the effect of word
accent on the F0 values of the tested vowels was statistically significant in the data of every
speaker (see Table 2-5 for ANOVA results for each speaker). According to the results of
Scheffe’s post-hoc tests (α = 0.01), the mean difference between stressed and unstressed
vowels was always statistically significant for every speaker.
As in the analysis of the Experiment 1 data, F0 ratios were computed for all speakers.
For each speaker, the F0 value of each accented vowel token was divided by the value of
the corresponding unaccented token for each repetition of each word pair type. For each
speaker, obtained ratio values were pooled across all word pairs and repetitions, and mean
ratio and standard deviation were computed. Results are summarized in Figure 2-6.
36
Table 2-5: ANOVA results for F0 data of L1 and L2 Japanese in Experiment 2
L1 Japanese
Advanced
L2 Japanese
Beginning
L2 Japanese
NE1
F(1, 45) = 241.26
p = <.0001
AE1
F(1, 32) = 133.53
p = <.0001
BE
1
F(1, 30) = 98.2
p = <.0001
NE2
F(1, 34) = 250.4
p = <.0001
AE2
F(1, 28) = 124.05
p = <.0001
BE
2
F(1, 26) = 72.84
p = <.0001
NE3
F(1, 41) = 104.14
p = <.0001
AE3
F(1, 59) = 122.35
p = <.0001
BE
3
F(1, 33) = 122.95
p = <.0001
NE4
F(1, 43) = 183.1
p = <.0001
BE
4
F(1, 23) = 92.09
p = <.0001
Figure 2-6: Average F0 ratio of Japanese accented/unaccented vowels
2
L1 Japanese
Advanced
L2 Japanese
NJ2
AJ1
Beginning
L2 Japnaese
F0 ratio
1.5
1
0.5
0
NJ1
NJ3
NJ4
AJ2
AJ3
BJ1
BJ2
BJ3
BJ4
The comparison of individual ratios across the three speaker groups shows no systematic
correlation between L2 Japanese proficiency and F0 ratios, indicating that the F0
characteristics of L1 English facilitate L2 Japanese produced by L1 English speakers
regardless of their L2 proficiency levels. This shows the effect of positive transfer.
37
Duration patterns
For each L1 Japanese speaker, duration means and standard deviations of accented and
unaccented vowels are plotted in Figure 2-7.
Figure 2-7: Duration means & standard deviations
of accented vs. unaccented vowels for L1 Japanese
100
100
duration (ms)
NJ1
NJ2
80
80
60
60
40
40
20
20
0
kiru
satoo
toshi
100
80
0
satoo
toshi
satoo
toshi
100
NJ3
80
60
60
40
40
20
20
0
kiru
kiru
satoo
toshi
accented
0
NJ4
*
kiru
unaccented
In Experiment 1, we found that all L1 English speakers produced stressed vowels as
systematically longer than unstressed vowels, for all tested English word pairs. This is not
the case in L1 Japanese production. In Figure 2-7, we can observe some cases in which
the mean duration of accented vowels is greater than the mean of unaccented vowels, but
this pattern is not consistent within the production of any L1 Japanese speaker.
I
conducted ANOVAs for the data of each L1 Japanese speaker, in order to examine the
38
effect of word accent on vowel duration. The results showed no significant effect (α =
0.01) in the data of any of the four L1 Japanese speakers (see Table 2-6 for ANOVA
results of each speaker). Scheffe’s post-hoc tests were also conducted in order to test the
significance of the difference of mean duration between accented and unaccented position
for each word pair and each L1 Japanese speaker (α = 0.01). In only one instance were the
accented and unaccented means significantly different (i.e., the kiru pair in NJ4’s
production), indicated by an asterisk in Figure 2-7. These results lead to the conclusion
that in L1 Japanese (unlike in L1 English), duration does not play an essential role in
differentiating the production of accented and unaccented monomoraic vowels.
Table 2-6: ANOVA results for duration data of L1 Japanese in Experiment 2
NJ1
F(1, 45) = .25
p = .6195
NJ3
F(1, 41) = 5.8
p = .0206
NJ2
F(1, 34) = 2.14
p = .1527
NJ4
F(1, 45) = 3.2
p = .0822
Following the same method for computing F0 ratios of accented to unaccented vowels
which I used for Figure 2-6, duration ratios of accented to unaccented vowels were
computed for all L1 and L2 Japanese speakers. For each speaker, obtained ratio values
were pooled across all word pairs, and a mean ratio and standard deviation were computed.
Results are summarized in Figure 2-8. In this figure, the absence of a significant durational
contrast in L1 Japanese speech, which was already observed in Figure 2-7, is shown by
the fact that the ratios of NJ1-4 cluster around 1.0.
39
Figure 2-8: Average duration ratio of Japanese accented/unaccented vowels
durational ratio
L1 Japanese
Advanced
L2 Japanese
Beginning
L2 Japanese
2
accented = unaccented
(no difference)
1
0
NJ1 NJ2 NJ3 NJ4 AJ1 AJ2 AJ3 BJ1 BJ2 BJ3 BJ4
The following duration patterns were observed for L2 Japanese. In the production of
the seven speakers of L2 Japanese, the magnitudes of duration ratios are roughly grouped
into three ranges. The ratios of AJ2, AJ3 and BJ2 fall into the range of L1 Japanese ratios;
AJ1, BJ1 and BJ3 show the greatest duration ratios among all seven speakers of L2
Japanese; finally, BJ4 shows a ratio somewhere in the middle. The great ratios of AJ1,
BJ1 and BJ3 confirm the effect of the transfer of the durational features of L1 English
stress in their production of L2 Japanese accent.
Notice that AJ1 shows the greatest duration ratio among all seven L2 Japanese
speakers, and that BJ2 shows a ratio close to the range of L1 Japanese ratios. These
results suggest that, unlike in the case of L2 English produced by native speakers of
Japanese, there is no systematic correlation between duration ratios of accented/unaccented
vowels and L2 proficiency levels in L2 Japanese produced by native speakers of English.
2.1.9. Discussion of Experiments 1 and 2
The results of Experiments 1 and 2 are summarized in Table 2-7 and Table 2-8,
respectively, which outline how the F0 and duration correlates were manipulated in word
accent production.
40
Table 2-7: Summary of results of Experiment 1 (English)
F0
duratio
n
L1 English
L2 English–L1 Japanese
stressed > unstressed
stressed > unstressed
(significant difference with 4
exceptions)
(significant difference for all cases)
stressed > unstressed
positive correlation between L2 English
proficiency and durational contrasts
(significant difference for all cases)
Table 2-8: Summary of results of Experiment 2 (Japanese)
F0
duration
L1 Japanese
L2 Japanese–L1 English
accented > unaccented
accented > unaccented
(significant difference for all
cases)
(significant difference for all cases)
accented = unaccented
no systematic correlation between L2
Japanese proficiency and durational
contrasts
(no significant difference with
one exception)
Manipulation of F0 and duration in L1 English vs. L1 Japanese
Results of Experiments 1 and 2 show that in both L1 English and L1 Japanese the F0
correlate is actively manipulated in the production of word accent. However, we also find
that a F0 contrast between accented and unaccented vowels is more consistently realized in
L1 Japanese than in L1 English.
Interestingly, the absence of consistency in the F0 contrast in L1 English is
compensated by a robust durational contrast. These differences between the two languages
can possibly be explained in the following way. In Japanese, the distribution of lexical
pitch accent greatly determines the tonal structure of the utterance, and there is only one
type of pitch accent (H*+L). In this language, lexically accented syllables are
phonologically specified with high tones (immediately followed by low tones) and
41
phonetically realized with a higher F0 peak than unaccented syllables. In contrast, in
English, the position of lexical stress specifies the possible landmark for pitch accents, but
pitch accents can have either high or low tones. Therefore, if a word stress receives a pitch
accent with a low tone (e.g., L*), a syllable with this word stress and a low tone accent is
produced with lower F0 than an adjacent unstressed syllable. Importantly, even in this
tonal context stressed syllables are longer than unstressed syllables10 . Thus, it is
reasonable to say that duration is a more stable correlate of word stress than F0.
Considering all, the manipulation of F0 and duration for word accent production can be
summarized for L1 English and L1 Japanese in the following manner:
•
In English, both F0 and duration are actively manipulated, but duration
plays a more stable role than F0 in order to realize the acoustic contrast of
stressed and unstressed vowels.
•
In Japanese, only F0 is significantly manipulated.
Development of F0 manipulation in L2 English and L2 Japanese
The analysis of F0 patterns in Experiments 1 and 2 showed that lexically accented vowels
were consistently higher in pitch than unaccented vowels in both L2 English–L1 Japanese
and L2 Japanese–L1 English regardless of L2 proficiency levels. This suggests that the
active role of F0 in the L1 accent system positively transfers to both L2 English and L2
Japanese, as expected. Our L1 English data showed that, although F0 is actively
manipulated, the F0 contrast is not as consistently realized as the durational contrast in L1
English. Interestingly, this characteristic of L1 English does not transfer to L2 Japanese
produced by native speakers of English: all seven speakers of L2 Japanese in Experiment 2
realized significant F0 contrasts with good consistency. In other words, the F0 correlate,
10
Also, a stressed syllable associated with a pitch accent with a low tone is still louder than an unstressed
42
which is manipulated actively but not consistently in L1 English, is manipulated actively
and consistently in the word accent production of L2 Japanese–L1 English.
The data of Experiment 1 showed that all eight Japanese speakers of L2 English
produced English stress with significant F0 contrasts, as shown in Figure 2-3. There was
no systematic relation between L2 proficiency and the use of F0. This confirms the results
of previous studies (for production, Shibuya 1997; for perception, see Beckman 1986,
Watanabe 1987), indicating that the F0 features of L1 Japanese largely facilitate the
production of English stress.
As reviewed in Section 2.1.3, Beckman’s (1986) study examined the role of the F0 cue
in the perception of L2 Japanese–L1 English. She found that exposure to authentic
Japanese was positively correlated with the use of F0 as the main cue in the perception of
Japanese accents by English speakers. Such a correlation, however, was not observed in
the production data of Experiment 2: in their production of Japanese accents, all of the
seven speakers of L2 Japanese used F0 as consistently as L1 Japanese speakers did,
regardless of L2 proficiency levels.
The difference in results between Beckman’s perception study and the present study
may be explained by the combination of the following two factors. First, the two studies
examined Americans with different language backgrounds. In Beckman’s study,
monolingual Americans with no knowledge of Japanese were compared with Americans
speaking Japanese who had lived in Japan at least one year in the past. The present study
examined two groups of Americans learning Japanese: Americans with a 2.5-years
classroom knowledge of Japanese, who had never lived in Japan (the beginning group),
and Americans who had lived in Japan at least four years (the advanced group). None of
Beckman’s monolingual subjects had any exposure to authentic Japanese in the past, while
all four Americans in the beginning group of the present study had listened to authentic
syllable (although no data on this is available from Experiment 1 of the present study).
43
Japanese in classroom for at least 2.5 years (see Table 2-2 for their Japanese backgrounds).
F0 is the dominant acoustic correlate of Japanese accent, carrying a significant degree of
perceptual salience in L1 Japanese speech, and it is likely that a few years of exposure to
authentic Japanese is sufficient for Americans to extract this feature of Japanese accents and
actually realize it in their Japanese production. Second, the different results in the two
studies may be simply attributed to a yet-to-be-identified difference between L2 perception
and L2 production.
Development of duration manipulation in L2 English and L2 Japanese
The results of Experiments 1 and 2 presented an asymmetrical pattern between the two L2
types in terms of the development of duration manipulation. For L2 English, advanced
Japanese speakers showed native-like durational patterns more reliably than beginning
speakers. However, for L2 Japanese there was no systematic difference in duration
manipulation between advanced and beginning speakers.
This asymmetry can be explained in the following way. In L1 English, duration is only
manipulated at the phonetic level: there is no phonemic length contrast, but stressed
syllables are predictably longer than unstressed syllables. Since duration plays no
phonemic role in L1 English, the manipulation of duration is only an unconscious phonetic
habit for native speakers of English. Thus, English speakers learning Japanese are
expected to have a hard time controlling duration when trying to suppress the durational
contrast between accented and unaccented syllables in their L2 Japanese speech. On the
other hand, in L1 Japanese duration is not actively manipulated in word accent production;
thus, there is no significant durational contrast between accented and unaccented syllables.
However, duration is active at the phonological level in this language, in the sense that
there is a phonemic length distinction between short and long segments (i.e., oki ‘off
shore’ vs. ooki ‘(last name)’. Short and long segments are categorically perceived as one
44
or two moras long by native speakers of Japanese. Therefore, it can be predicted that
Japanese speakers learning English are sensitive to the duration cue and become able to
control duration consciously as they have more learning experience.
2. Experiments 3: Word accent production after focus in L2 English
2.2.1. Prosodic context of the target word in Experiment 1: Nuclear
position
In Experiment 1, the target word was embedded in nuclear position, where the word stress
is realized with the highest peak of pitch of the entire utterance, as indicated by Figure 2-9.
A stress in nuclear position is equivalent to what was labeled ‘sentence stress’ in the
traditional description of English prosody. The term nuclear (pitch) accent is commonly
employed in Intonational Phonology (Pierrehumbert 1980; Beckman and Pierrehumbert
1986; Ladd 1996). According to the results of Experiment 1, only advanced Japanese
speakers of L2 English were able to produce a significant durational contrast between
stressed and unstressed vowels in addition to a F0 contrast.
Figure 2–9: Realization of nuclear pitch accent in English
pitch (F0)
nuclear position
word stress
of target word
I said SUBject this time
2.2.2. Possible learning strategy in L2 English–L1 Japanese
It is possible that advanced Japanese speakers of L2 English in Experiment 1 used the
following strategy: they used an acoustic correlate which is already active in the production
45
of L1 word accent (i.e., F0), in order to learn to control a correlate which is not active in
L1 (i.e., duration). This hypothesis can be phrased in the following way:
Japanese learners of English first employ F0, an active correlate in their L1
Japanese accent system, in order to contrast lexically stressed and
unstressed vowels in their L2 English production, and later learn to use
duration, an inert correlate in their L1 system. By employing this strategy,
advanced Japanese speakers of L2 English still use F0 as the major stress
cue, and they lengthen vowels when they raise F0 (pitch) in the word stress
context.
The main goal of Experiment 3 is to find whether this learning strategy is employed or not
in the production of English stress by native Japanese speakers.
2.2.3. Context of the target word in Experiment 3: Post-nuclear position
Whether the aforementioned learning strategy is employed by Japanese speakers of L2
English or not can be tested in a prosodic context where F0 and duration are not positively
correlated in English stress realization. This condition is satisfied by post-nuclear position.
In English, contrastive focus is assigned to items that are contrasted in a pragmatic
context, as shown in Figure 2-10. In the pragmatic context in Figure 2–10, the subjects of
the two sentences, Bob and I, are contrasted and attract sentence stress with the highest
pitch peak. In American English, after a contrastive focus, the F0 opposition between
stressed and unstressed syllables disappears (post-nuclear deaccentuation), but the
durational contrast remains (see Huss 1978 and Ueyama & Jun 1998 for acoustic
evidence). Therefore, in post-nuclear position, the phonetic contrast between stressed and
unstressed syllables is still realized in terms of the duration correlate in American English,
even though their F0 contrast is lost.
46
Figure 2–10: Realization of post-nuclear word stress in English
Bob didn’t say SUBject
pitch (F0)
nuclear position
post-nuclear
Contrastive
Focus
I
word stress
of target word
said SUBject this time
In Japanese, there is a strong tendency toward the suppression of the F0 contrast
between accented and unaccented syllables after a contrastive focus (Maekawa 1994).
However, F0 suppression in post-focus position is not as complete in Japanese as in
English, and there may be a trace of a contrast between lexically accented and unaccented
syllables after a contrastive focus, even though the magnitude of the F0 contrast is less
(Maekawa 1994). Thus, it can be said that English generally shows a greater degree of F0
suppression in post-focus position than Japanese.
2.2.4. Expected patterns
F0 patterns
Based on the findings of earlier studies, it can be predicted that in L1 English there will be
no significant F0 contrast between lexically stressed and unstressed vowels after
contrastive focus, because of post-focus deaccentuation.
The study of Ueyama and Jun (1998) examined how post-focus deaccentuation is
learned by Japanese learners of English. The results of their experiment showed that there
is a positive correlation between L2 oral proficiency and the mastery of post-focus
deaccentuation: i.e., more advanced Japanese learners are better at deleting pitch accents in
47
order to realize a plateau-shaped F0 contour after a contrastive focus. This finding leads us
to the following prediction: more proficient Japanese learners of English will show less F0
contrast between stressed and unstressed syllables in post-nuclear position.
Duration patterns
According to Huss's (1978) findings, the duration contrast between stressed and
unstressed vowels in English is preserved after a contrastive focus within the same
prosodic phrase, although the F0 contrast disappears. Thus, it can be predicted that in L1
English stressed vowels will be significantly longer than unstressed vowels even in postnuclear position.
The results of Experiment 1 in the present study show that increased exposure to the
native input of a target language (i.e., English) can help Japanese speakers of L2 English to
activate duration, which is an inert correlate in L1 Japanese, in English neutral declaratives.
Since the help of F0 as a stress cue is not available in post-nuclear position, where the F0
contrast disappears, Japanese learners of English are expected to have more difficulties in
producing a durational contrast between stressed and unstressed vowels in this prosodic
context.
2.2.5. Subjects
Four speakers of L1 English and 4 advanced and 3 beginning Japanese speakers of L2
English participated in Experiment 3. All these speakers also participated in Experiment 1.
2.2.6. Speech materials
The same pairs of nouns and verbs used in Experiment 1 were used in Experiment 3:
CONtract vs. conTRACT; DIgest vs. diGEST; PERmit vs. perMIT; SUBject vs.
subJECT.
A context sentence and frame sentences with the same target word were
presented in order to elicit an expected prosodic pattern:
48
Context:
I read Reader’s Digest.
Frame:
MOTO didn’t say digest.
I said digest this time.
Here it is expected that the two items pragmatically contrasted, MOTO and I, attract a
nuclear pitch accent in each sentence and that the post-focus part of each sentence including
the target word digest is deaccented. The first vowel of the target word in the second frame
sentence (I said digest this time, in this example) was measured for F0 and duration values.
Obtained values were analyzed by comparing stressed vs. unstressed conditions (e.g.,
DIgest vs. diGEST) in order to examine the effect of a word stress on each of the two
acoustic correlates of word accent.
2.2.7. Procedure
The same procedure used in Experiment 1 was used (see Section 2.1.6.).
2.2.8. Results of Experiment 3
F0 patterns
As in Section 2.2.4, considering the findings of Huss (1978) and Ueyama & Jun (1998),
we expect that L1 English speakers would show no significant F0 contrast between
stressed and unstressed vowels after a contrastive focus. Figure 2-11 shows the F0 means
and standard deviations of the first syllable of the target word in stressed and unstressed
conditions for each L1 English speaker and each word pair. In Figure 2-11, the word pair
showing a significant difference between the stressed vs. unstressed mean in a Scheffe’s
post-hoc test is marked by an asterisk (α = 0.01). The four speakers of L1 English show
no significant F0 contrast between stressed and unstressed vowels except for one case (the
permit pair in NJ3’s speech). This result confirms the expected pattern: in L1 English there
is no F0 contrast of stressed and unstressed vowels in post-nuclear position.
49
Figure 2-11: F0 means & standard deviations of stressed vs. unstressed vowels
for L1 English in post-nuclear position
F0 (Hz)
140
140
NE1
120
120
100
100
80
80
60
60
40
40
20
20
0
200
contract digest
permit subject
0
140
NE3
*
contract digest
permit subject
NE4
120
160
100
120
80
60
80
40
40
0
NE2
20
contract digest
permit subject
stressed
0
contract digest
permit subject
unstressed
(* = significantly different at α = 0.01)
Figures 2-12 and 2-13 show F0 means and standard deviations for advanced and
beginning L2 English, respectively. In Figure 2-12, we find two patterns among the four
advanced Japanese speakers of L2 English.
First, AE1, AE3 and AE4 made no F0
distinction between stressed and unstressed vowels in post-nuclear position, indicating that
they were able to deaccent the word stress of the target word in a native-like manner.
Second, AE2 made a significant difference for two word pairs (digest and permit), but not
for the other two pairs (contract and subject).
50
Figure 2-12: F0 means & standard deviations stressed vs. unstressed vowels
for advanced L2 English in post-nuclear position
200
280
AE1
160
160
80
120
F0 (Hz)
*
80
40
160
140
120
100
80
60
40
20
0
*
200
120
0
AE2
240
40
contract digest
permit subject
0
280
AE3
contract digest
permit subject
AE4
240
200
160
120
80
40
contract digest
permit subject
stressed
0
contract digest
permit subject
unstressed
(* = significantly different at α = 0.01)
In Figure 2-13, two patterns are observed among the three beginning speakers of L2
English.
First, BE1 and BE3 produced stressed vowels consistently higher than
unstressed vowels for all 4 word pairs, indicating the absence of post-focus
deaccentuation. Second, BE2 showed the same non-native-like pattern for permit and
subject, but not for the remaining two pairs.
51
Figure 2-13: F0 means & standard deviations stressed vs. unstressed vowels
for beginning L2 English in post-nuclear position
200
180
BE1
BE2
*
160
*
*
*
F0 (Hz)
120
90
60
40
30
280
240
*
120
80
0
*
150
contract digest
permit subject
BE3
*
*
*
*
0
contract digest
permit subject
stressed
unstressed
200
160
120
80
40
0
contract digest
permit subject
(* = significantly different at α = 0.01)
A statistically significant difference between stressed and unstressed F0 means indicates
the presence of a significant F0 contrast between stressed and unstressed vowels and the
absence of post-nuclear deaccentuation; no significant difference indicates the realization of
deaccentuation in post-nuclear position. The distribution of significant F0 contrasts is
summarized for all L1 and L2 English speakers in Table 2-9.
The native pattern is
characterized by the absence of F0 contrast (i.e., post-nuclear deaccentuation), while the
non-native pattern is characterized by the presence of a significant F0 contrast. In the table,
52
statistically significant differences are indicated by a check mark, and cells showing nonnative patterns are shaded11 .
Table 2-9: F0 contrast in post-nuclear position
contract
digest
permit
subject
BE1
2
3
√
no
√
√
no
√
√
√
√
√
√
√
AE1
2
3
4
no
no
no
no
no
√
no
no
no
√
no
no
no
no
no
no
NE1
2
3
4
no
no
no
no
no
no
no
no
no
no
√
no
no
no
no
no
√ = stressed vowel is significantly higher in F0 than unstressed vowel (α = 0.01)
shaded cell = expected non-native pattern
Observed L2 English patterns can be classified into three types:
•
AE1, AE3 and AE4 suppressed the F0 contrast of stressed and unstressed
vowels in post-nuclear position for all 4 tested word pairs; this indicates that
these three advanced speakers of L2 English are able to realize native-like
post-nuclear deaccentuation successfully.
11
For F0 data, the presence of a significant difference in stressed and unstressed vowels is the expected nonnative pattern. In contrast, for duration data, the absence of a significant difference is the expected nonnative pattern. In order to show this difference between F0 and duration in terms of expected non-native
patterns, I used both check marks and shaded cells.
53
•
BE1 and BE3 preserved the F0 contrast of stressed and unstressed vowels
for all 4 tested word pairs; this means that these two beginning speakers of
L2 English do not deaccent post-nuclear stress at all.
•
AE2 and BE2 suppressed the F0 contrast for two pairs, but not for the other
two pairs; this indicates that these two speakers have begun to learn but
have not mastered post-nuclear deaccentuation.
These results show the following general pattern for L2 English data: more advanced
learners tend to realize post-nuclear deaccentuation (i.e., suppress F0 contrast of stressed
and unstressed vowels after a contrastive focus) more reliably. This confirms the findings
of Ueyama & Jun (1998).
Duration patterns
The duration means and standard deviations of stressed and unstressed vowels in postfocus position are plotted for the four speakers of L1 English and for each word pair in
Figure 2-14. Scheffe’s post-hoc tests were conducted to test the statistical significance of
mean differences for each word pair (α = 0.01).
A word pair showing a significant
difference between stressed and unstressed means is indicated by an asterisk.
In
Experiment 1, it was shown that in nuclear position all 4 L1 English speakers produced a
stressed syllable with significantly longer duration than an unstressed vowel for all 4 word
pair types. For post-nuclear position, a similar relation is observed with one exception (the
subject pair in NE1’s production), as shown in Figure 2-14.
In Section 2.2.4, we predicted that Japanese learners of English would have more
difficulties in producing a significant durational contrast between stressed and unstressed
vowels in post-nuclear position, where the F0 contrast disappears, since they cannot use
the help of F0 to activate duration. This prediction can be tested only if Japanese speakers
54
of L2 English suppress the F0 contrast successfully. Since the F0 analysis of L2 English
data for Experiment 3 have already shown that the beginning Japanese speakers of L2
English have more difficulties in suppressing F0 contrast after a contrastive focus than the
advanced speakers of L2 English, we will examine only the data of advanced L2 English
for duration patterns.
duration (ms)
Figure 2-14: Duration means & standard deviations
of stressed vs. unstressed vowels for L1 English in post-nuclear position
160
140
120
100
80
60
40
20
0
240
NE1
NE2
*
*
160
*
*
40
contract digest
permit subject
0
contract digest
permit subject
240
*
NE4
*
200
160
*
120
80
*
*
120
*
40
0
*
80
NE3
160
*
120
240
200
*
200
*
80
*
40
contract digest
permit subject
stressed
0
contract digest
permit subject
unstressed
(* = significantly different at α = 0.01)
For the four Japanese speakers of advanced L2 English, means and standard deviations
of stressed and unstressed vowels are plotted in Figure 2-15.
Word pairs showing a
significant difference by Scheffe’s post-hoc tests (α = 0.01) are marked by an asterisk. In
55
Figure 2-14, we observed a uniform pattern across the four speakers of L1 English:
stressed vowels were significantly longer in duration than unstressed vowels for all four
tested word pairs. On the other hand, Figure 2-15 shows three different patterns among
the four Japanese speakers of advanced L2 English. AE1 showed a native-like significant
difference between stressed and unstressed vowels for all the four word pairs; AE2 and
AE4 showed a significant difference for three out of four pairs; AE2 showed no significant
difference for any word pairs.
Figure 2-15: Duration means & standard deviations of stressed vs. unstressed
vowel for advanced L2 English in post-nuclear position
140
120
100
AE1
180
120
*
60
40
*
90
30
contract digest
permit subject
0
contract digest
permit subject
140
180
AE3
120
150
100
120
AE4
*
*
80
90
60
60
40
30
20
0
*
*
60
20
0
*
150
*
80
duration (ms)
AE2
*
contract digest
permit subject
stressed
0
*
contract digest
permit subject
unstressed
(* = significantly different at α = 0.01)
56
In order to show the difference between L1 English and advanced L2 English patterns,
the distribution of a significant durational contrast in post-nuclear position is summarized
for L1 English and advanced L2 English in Table 2-11, following the same procedure for
summarizing the distribution of significant F0 contrasts in Table 2-9. Additionally, the
distribution of durational contrast in nuclear position from the data of Experiment 1 is
summarized in Table 2-10. In both tables, a check mark indicates an expected native
English pattern: the mean of the stressed vowel is significantly longer than the mean of the
unstressed vowel. Cells showing non-native patterns are shaded.
Table 2-10: Durational contrast in nuclear position
AE1
2
3
4
NE1
2
3
4
contract
digest
permit
subject
√
√
√
√
√
√
√
√
√
no
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
Table 2-11: Durational contrast in post-nuclear position
AE1
2
3
4
NE1
2
3
4
contract
digest
permit
subject
√
√
no
√
√
√
√
√
√
√
no
√
√
√
√
√
√
√
no
no
√
√
√
√
√
no
no
√
no
√
√
√
√ = stressed vowel is significantly longer than unstressed vowel (α = 0.01)
shaded cell = expected non-native pattern
57
The comparison of the two tables shows the following patterns for advanced L2 English:
•
AE1 and AE2 did not show a difference in nuclear vs. post-nuclear
positions in terms of the frequency of non-native-like patterns: none for
AE1; one word pair for AE2.
•
AE3 and AE4 produced more non-native-like patterns in post-nuclear
position.
These suggest that Japanese learners of English generally have more difficulties in
producing a durational contrast in post-nuclear than in nuclear position.
2.2.9. Discussion of Experiment 3
Possible learning strategy in L2 English–L1 Japanese
Results of Experiment 3 showed that in L1 English F0 and duration are manipulated
independently in post-nuclear position: there is no F0 contrast but a significant duration
contrast between stressed and unstressed syllables after contrastive focus. For L2 English,
we found that advanced Japanese learners of English overall have a harder time to produce
a significant durational contrast in post-nuclear position than in nuclear position.
In Section 2.2.4, we assumed the following learning strategy employed by native
Japanese speakers in their production of English word stress:
Japanese learners of English first employ F0, an active correlate in their L1
Japanese accent system, in order to contrast lexically stressed and
unstressed vowels in their L2 English production, and later learn to use
duration, an inert correlate in their L1 system. By employing this strategy,
advanced Japanese speakers of L2 English still use F0 as the major stress
58
cue, and they lengthen vowels when they raise F0 (pitch) in the word stress
context.
In order to test this assumption in a strict way, the suppression of the F0 contrast is a
required condition. In this section, we will examine the interaction of F0 and duration
patterns.
L1 English manipulation of F0 and duration
For L1 English, the manipulation of F0 and duration is summarized in Table 2-12, based
on results of Experiments 1 and 3. In L1 English, in nuclear position, F0 and duration are
significantly different in stressed and unstressed vowels. On the other hand, in postnuclear position, the F0 contrast is lost, but a significant durational contrast is still
preserved.
Table 2-12: Manipulation of F0 and duration in nuclear vs. post-nuclear positions
NUCLEAR (Exp. 1)
F0
√
vs.
Duration
√
POST-NUCLEAR (Exp. 3)
F0
no
Duration
√
√ = significant difference between stressed vs. unstressed means
(stressed > unstressed; α = 0.01)
no = no significant difference
Distribution of F0 and duration contrasts in advanced L2 English
For advanced L2 English data, we also examined the distribution of F0 and duration
contrasts in nuclear vs. post-nuclear position. We conducted Scheffe’s post-hoc tests for
F0 and duration (α = 0.01) in order to test whether stressed and unstressed means were
significantly distinguished for each word pair.
For each position of the target word
(nuclear or post-nuclear), we summarized the distribution of presence vs. absence of
significant mean differences for both F0 and duration for each word pair for each speaker,
59
then we pooled the 4 advanced Japanese speakers of L2 English. For each word position,
there were 16 cases in total, since there were 4 advanced speakers of L2 English and 4
word pairs (4 speakers X 4 word pairs).
Results are summarized in Tables 2-13 and 2-14a:
Table 2-13: F0 and duration (D.) contrasts in nuclear position
for advanced L2 English
cases
native-like
F0
D.
√
√
15
F0
no
non-native-like
D . F0 D . F0 D .
√
no no
√
no
0
0
1
Table 2-14a: F0 and duration (D.) contrasts in post-nuclear position
for advanced L2 English
cases
native-like
F0
D.
no
√
9
non-native-like
F0 D . F0 D . F0 D .
√
√
no no
√
no
2
5
0
√ = significant difference (stressed > unstressed; α = 0.01)
no = no significant difference
Difference between nuclear vs. post-nuclear position
The comparison of Tables 2-13 and 2-14a shows that there are 6 more instances of nativelike manipulation of F0 and duration in nuclear than post-nuclear position. This indicates
that advanced Japanese speakers of L2 English have more difficulty in controlling F0 and
duration correlates in post-nuclear position.
This confirms the prediction presented in
Section 2.2.4.
Analysis of non-native-like patterns in post-nuclear position
Non-native patterns in post-nuclear position in Table 2-14a fall into 2 types:
60
Type 1: positive relation between the manipulation of F0 and duration
F0
√
D.
√
F0
no
D.
no
Type 2: negative relation between the manipulation of F0 and duration
F0
√
D.
no
Interestingly, there are more instances of Type 1 patterns than Type 2 patterns (7 vs. 0) in
the post-nuclear data, as presented in Table 2-14b (the same data in Table 2-14a with
emphasis on non-native-like patterns).
This result brings support to the following
hypothesis:
Japanese learners of English learn to activate duration with the help of F0
(i.e., lengthen vowel when F0 is high; keep vowel short when F0 is low)
and only later learn to control duration and F0 independently.
Table 2-14b: F0 and duration contrasts in post-nuclear position
for advanced L2 English
cases
native-like
F0
D.
no
√
9
non-native-like
F0 D . F0 D . F0 D .
√
√
no no
√
no
2
5
0
2.3. Summary of Experiments 1–3
In Experiments 1 and 2, we investigated how the manipulation of F0 and duration in the L1
system transfers to the production of L2 word accent in neutral declaratives. The findings
of earlier studies show how F0 and duration are manipulated in word accent production of
L1 English and L1 Japanese. F0 is actively used in both L1 English and L1 Japanese,
while duration is actively used in L1 English, but not in L1 Japanese:
61
F0
Duration
L1 English
YES
YES
L1 Japanese
YES
NO
These expected L1 patterns were overall supported by the results of Experiments 1 and 2.
Our new finding is that in L1 English duration plays a more stable role in contrasting
lexically stressed and unstressed vowels than F0, although both acoustic correlates are
actively manipulated at least in neutral declaratives. Thus, the manipulation of F0 and
duration observed in our L1 data can be summarized in the following way:
F0
L1 English
YES
L1 Japanese
YES
Duration
<<<
YES
NO
Considering the L1 system, we originally expected that the active role of F0 in both L1
types would positively transfer to L2 production. The F0 analysis of our L2 data brought
support to this prediction, showing that lexically accented vowels were significantly and
consistently higher in F0 than unaccented vowels in both L2 English and L2 Japanese.
For L2 duration patterns, we expected that what must be learned in the target language
would be opposite in L2 Japanese and L2 English, since the role of duration is opposite in
L1 English and L1 Japanese:
F0
Duration
L1 English
YES
YES
L1 Japanese
YES
NO
62
suppress
in L2 Japanese
activate
in L2 English
In order to learn the duration patterns of their target language, native English speakers
learning Japanese need to learn to suppress the duration contrast in their L2 Japanese, while
native Japanese speakers learning English needed to learn to activate duration in their L2
English.
The analysis of L2 duration patterns in Experiments 1 and 2 presents an
asymmetry between L2 English and L2 Japanese in terms of the relation between L2
proficiency and duration patterns. For L2 English, advanced Japanese learners reliably
showed more native-like durational patterns than beginning learners; however, for L2
Japanese, there was no systematic difference in the manipulation of duration between
advanced and beginning learners.
This asymmetry in the development of duration
manipulation in the two L2 types can possibly be explained by the different phonological
status of duration in L1 English and L1 Japanese. This difference results in different
sensitivity to the duration cue and also in the different ability to control duration
consciously in L2 word accent production.
In Experiment 1, advanced Japanese speakers of L2 English were able to produce a
significant durational contrast between stressed and unstressed vowels in addition to a F0
contrast, while beginning learners only produced the F0 contrast consistently. In order to
explain this result, we suggested that advanced speakers of L2 English employ the
following learning strategy: they start learning to lengthen vowels when they raise pitch in
the nuclear-stress context of the target word.
In order to test this assumption, in
Experiment 3, we examined how Japanese speakers of L2 English would manipulate
duration when the English target word was embedded in post-nuclear position, where the
F0 contrast between stressed and unstressed syllables disappears but the durational contrast
remains in L1 English. The comparison of advanced L2 English patterns in Experiments 1
and 3 and the detailed analysis of non-native-like patterns in Experiment 3 brought strong
support to our hypothesis.
63
Three general points emerge from the results of Experiments 1–3.
First, in the
beginning stage of L2 development, L2 speakers tend to import L1 phonetic habits in L2
word accent production. However, they are also able to modify these L1 habits to simulate
L2 patterns. Consider in this respect the case of F0 in the production of L2 Japanese accent
by English speakers. F0 is an active correlate in both L1 English and L1 Japanese for
word accent production, but the two languages are different in the sense that the F0 contrast
is more robust and stable in L1 Japanese than in L1 English. The results of Experiment 2
showed that native English speakers, regardless of whether they are beginning or advanced
learners of Japanese, produce a more robust and stable contrast of F0 in their L2 Japanese
than in their L1 English. This result indicates that these learners not only are able to use F0
in Japanese accent production, since it is already active in their L1 production of word
accent, but that they are able to modify the phonetic patterns present in their L1 (English) to
produce the robust contrast of F0 in their L2 (Japanese). Second, the asymmetry in the
development of duration manipulation in the two L2 types suggests that we cannot predict
L2 development patterns in a straightforward way on the basis of L1 phonetic habit. In
order to explain the asymmetry, it is necessary to consider whether the manipulation of an
acoustic correlate plays a distinctive role or not in the L1 system. Finally, as a learning
strategy, L2 speakers may use an acoustic correlate which is already active in the L1 system
(e.g., F0 in L1 Japanese), in order to learn to control a correlate which is not active in L1
(e.g., duration in L1 Japanese).
64
Chapter 3:
Vowel Contrast in L2 English and L2 Japanese
3.1. Vowel system of English and Japanese
The vowel system of L1 English (American English) is richer than the vowel system of L1
Japanese (Tokyo Japanese), as shown in Figure 3-1. American English has nine phonemic
monophthongal vowels, i.e., [i, I, E, œ, ´/ø, u, U, O, A]12 , while Tokyo Japanese has five
short vowels, i.e., [a, e, i, o, u] (Ladefoged 1993). In Japanese and most dialects of
American English, these vowels are distinguished by height and backness.
Figure 3-1: Vowel system of English and Japanese
American English
Tokyo Japanese
i
u
I
U
´/ø
E
œ
u, uu
i, ii
e, ee
O
A
o, oo
a, aa
An important characteristic of the Japanese vowel system is that each short vowel has a
long counterpart: [a, aa], [e, ee], [i, ii], [o, oo], [u, uu]. In other words, in Japanese short
and long vowels are phonemically contrasted in every region of the vowel space defined on
the basis of vowel height and backness (e.g., /biiru/ ‘beer’ vs. /biru/ ‘building’ in the high
front region; /tooru/ ‘pass’ vs. /toru/ ‘catch’ in the mid back region).
12
The vowel inventory varies from variety to variety of English, and which vowel is classified as a
monophthong or diphthong varies from study to study. In the present study, we adapted the vowel
inventory (based on Midwestern American English) and classification of monophthongs and diphthongs
presented in Ladefoged’s A Course in Phonetics (1993, pp. 80-84). Ladefoged grouped [i, I, E, œ, ´/ø, u,
U, O, A] as monophthongs, and [aI, aU, aI, aU, eI, oU, ju] as diphthongs.
65
While Japanese has a long-short vowel contrast in every spectral region of the vowel
space, English has a vowel length contrast only in certain regions (e.g., meat /mit/ vs. mitt
/mIt/ in the high front region; pool /pul/ vs. pull /pUl/ in the high back region; bat /bœ t/ vs.
bet /bEt/ in the low front region).
The role of vowel length differs in the two languages. In Japanese, vowel length is
phonemic. On the other hand, in English vowel length is not phonemic but it is one of the
phonetic correlates of the tense-lax contrast (more peripheral vowels, such as [i] are
classified as tense, whereas more central vowel, such as [I], are classified as lax). Because
of this difference, it is interesting to compare the long and short vowels in the production of
L2 English by L1 Japanese speakers and L2 Japanese by L1 English speakers.
3.2. Characteristics of vowel length contrast in English vs. Japanese
In addition to the aforementioned difference in phonemic status, the English tense-lax
contrast and the Japanese long-short contrast differ in three aspects that will be discussed in
the following sections: appurtenance to prosodic unit, phonetic duration and vowel quality.
The characteristics of each language with respect to these three aspects are summarized in
Table 3-1.
Table 3-1: English and Japanese vowel contrasts
English (tense vs. lax)
Japanese (long vs. short)
prosodic unit
SAME
tense: monosyllabic
short: monosyllabic
DIFFERENT
long: bimoraic (V.V)
short: monomoraic (V)
phonetic duration
contrast
YES
[I] : [i] = 1 : 1.3
YES
[i] : [ii] = 1 : 2~3
vowel quality
contrast
YES
tense: more peripheral
YES?
long: more peripheral?
3.2.1. Prosodic unit
66
As mentioned in Chapter 1, in the case of word segmentation, native speakers of English
use the syllable as the minimal segmentation unit, while native speakers of Japanese use the
mora. English speakers segment both tense and lax vowels into single sublexical prosodic
units (= syllables). For example, both seat /sit/ and sit /sIt/ are treated as monosyllabic
words by English speakers. Similarly, Japanese long vowels (but not short vowels) can be
segmented by Japanese speakers into single syllables. However, the crucial difference
between the two languages is that Japanese speakers further break long vowels into two
subsyllabic units (= moras). For example, the word, to-o-ru ‘pass’ contains three moras,
but to-ru ‘catch’ contains only two moras.
See Chapter 5 for further discussion of
prosodic units in the speech segmentation of English and Japanese.
3.2.2. Phonetic duration contrast
English tense vs. lax vowels have different phonetic duration, although they do not contrast
phonemically (no phonemic length distinction). I computed the durational ratio of /i/ to /I/
based on the duration data of male speech in Peterson & Lehiste (1960) and Hillenbrand et
al. (1995). The results based on the data of the two studies are consistent: the high front
tense vowel /i/ is about 30% longer in duration than the corresponding lax /I/. The
phonemic length contrast of Japanese vowels is acoustically realized with a greater
durational ratio of long to short vowels than the English tense vs. lax contrast. In wordmedial position, long vowels are 2.5 to 3 times longer than the corresponding short vowels
(Han 1962). Thus, English and Japanese are phonetically similar in the sense that some
vowels are characterized by a difference in duration, but this difference in phonetic duration
is much greater in Japanese.
67
3.2.3. Vowel quality
Vowel quality plays an essential role in contrasting English tense and lax vowels. Tense
vowels are systematically more peripheral in the vowel space than the corresponding lax
vowels (e.g., /i/ is more front and higher than /I/) (see Hillenbrand et al. 1995 for data on
contemporary American vowels). The study of Nishi et al. (1998) shows that there is a
difference between English and Japanese in terms of the role of vowel quality in the
production of tense-lax/long-short vowels. Nishi et al. measured the frequency of the first
and second formants (F1 and F2, henceforth) of English tense and lax vowels, and they
conducted a similar formant analysis for the Japanese long vs. short contrast. Their results
showed that the spectral overlap between Japanese long vs. short vowels (e.g., /ii/ vs. /i/)
is significantly larger than the one between English tense and lax vowels. This indicates
that in Japanese, vowel quality plays a weaker role in acoustically separating long and short
vowels than in English, where tense vs. lax vowels are well differentiated by vowel
quality.
3.2.4. Duration vs. vowel quality in the production of vowel contrasts
The aforementioned review of differences between L1 English and L1 Japanese shows an
interesting balance of the roles of two physical correlates, duration and vowel quality, in
the vowel system of each language. Duration or length plays a weaker role in contrasting
English tense and lax vowels than in contrasting Japanese long and short vowels.
However, in the production of the English tense-lax contrast, the weaker role of duration is
compensated by the greater role of vowel quality. We find the reverse relation in the
production of the Japanese long-short contrast, where duration plays a greater role than
vowel quality. This difference between L1 English and L1 Japanese, together with the
difference in the phonemic status of vowel length, makes it interesting to compare the
68
production of L2 English by L1 Japanese speakers with the production of L2 Japanese by
L1 English speakers.
3.3. Problems
3.3.1. L2 Japanese–L1 English
It is a well known fact among Japanese language instructors that the phonemic length
contrast of Japanese vowels is very difficult for native English speakers to learn in both
production and perception in the initial stage of their L2 Japanese development. Japanese
short vowels produced by native English speakers tend to be perceived as the
corresponding long vowels by native Japanese speakers. Thus, for example, obasan ‘aunt’
in the production of Japanese by English speakers is often misunderstood as obaasan
‘grandmother’, and a similar confusion occurs in the contrast between shujin and shuujin
obasan
‘aunt’
vs.
obaasan
‘grandmother’
shujin
‘husband’
vs.
shuujin
‘prisoner’
At this point, it is difficult to explain which Japanese vowel category is substituted with
which English vowel category in the production of these minimal pairs.
Do English
speakers produce shuujin with their English tense /u/ and shujin with their lax /U/? If not,
do they rather produce both Japanese long and short vowels in the same spectral region and
differentiate them by duration? In any case, we believe that the observed durational
ambiguity between Japanese long and short vowels in L2 Japanese–L1 English is the
product of the negative transfer of the fact that there is no phonemic length in L1 English.
An additional characteristic of English which may affect the production of Japanese
vowels by English speakers is the dominant role of vowel quality in perceiving the English
tense vs. lax contrast. Bohn and Flege (1990) examined the perception of the tense-lax
69
contrast (/i/ vs. /I/ and /E/ vs. /œ/) by native English speakers in order to assess the relative
salience of duration and vowel quality. Their results showed that native English speakers
use vowel quality as the main cue to the vowel contrast and are not very sensitive to
durational differences. This suggests that native English speakers are likely to have
difficulties in perceiving the contrast of Japanese long vs. short vowels, which are
significantly different only in duration, but not in vowel quality. The relatively dominant
role of vowel quality (and the weaker status of duration) in the perception of the English
vowel contrast) is then expected to affect not only the perception but the production of
Japanese vowels by native English speakers.
3.3.2. L2 English–L1 Japanese
The prominent durational contrast between Japanese long and short vowels is likely to
transfer to L2 English production, as shown in the following example:
mitt vs. meat
<->
1
:
< --->
2 ~ 2.5
predicted vowel duration
in inexperienced L2 English–L1 Japanese
If beginning Japanese learners of English pronounce this minimal pair, they are likely to
directly import the durational characteristics of the Japanese vowel contrast, i.e., the
Japanese durational ratio of long to short vowels (approximately 2~2.5).
Sugito (1982b) compared the durations of English /i/ and /I/ produced by native English
speakers with those produced by native Japanese speakers. The results showed that the
duration ratio of /i/ to /I/ is significantly larger in L2 English produced by Japanese speakers
than in L1 English.
The absence of a significant quality difference between Japanese long and short vowels
should also transfer, as shown in the following example:
70
mitt vs.
meat
[ i ]
[ ii ]
predicted vowel quality
in inexperienced L2 English–L1 Japanese
The example shows the expected pattern of vowel quality in inexperienced L2 English
produced by Japanese speakers.
It is expected that the two vowel categories are
differentiated by a significant difference in duration, but not in vowel quality, due to the
transfer of the absence of a significant quality difference between Japanese short and long
vowels. The transfer of spectral features of L1 Japanese in the production of the English
/i/–/I/ contrast was observed by Sugito (1982b) in the study mentioned earlier.
3.4. Goal
Two experiments were conducted in order to investigate the effect of L1 phonetic habits on
the realization of the vowel contrast in L2 English–L1 English and L2 Japanese–L1
English. Experiments 4 and 5 examined L2 English produced by native speakers of Tokyo
Japanese and L2 Japanese produced by native speakers of American English, respectively,
in terms of the patterns of duration and vowel quality. In each experiment, L1, experienced
L2 and inexperienced L2 speakers were compared in order to see possible developmental
patterns.
Research questions
The comparison of the phonetic features of the vowel contrast in the two L1 types in
Section 3.3 leads us to the following research questions:
Experiment 4 (L2 English–L1 Japanese)
•
Do native Japanese speakers learn to weaken their prominent duration
contrast in the production of English tense and lax vowels?
•
Do they learn to produce a native-like significant contrast of vowel quality?
71
weaken contrast
in L2 English
L1 Japanese
Duration Ratio
Vowel Quality
L2 English
long: short = 2 : 1
tense : lax = 1.3 : 1
long ~
~ short
tense =\ lax
enlarge contrast
in L2 English
Experiment 5 (L2 Japanese–L1 English)
•
Do native English speakers learn to produce a native-like large duration contrast
between Japanese short and long vowels?
•
Do they learn to avoid producing a significant difference in vowel quality in the
production of Japanese short and long vowels?
enlarge contrast
in L2 Japanese
L1 English
Duration Ratio
tense : lax = 1.3 : 1
Vowel Quality
L2 Japanese
long: short = 2 : 1
long ~
~ short
tense =\ lax
weaken contrast
in L2 Japanese
In order to answer these questions, we assessed the effect of vowel type (tense vs. lax
for English and long s. short for Japanese) on duration and vowel quality.
3.5. Method
3.5.1. Speech materials
In Experiment 4, four minimal pairs of English high front tense /i/ vs. lax /I/ were used:
bead – bid, deep – dip, keen – kin and Pete – pit. The target word was presented in a
frame sentence:
72
I said
next.
In Experiment 5, three minimal pairs of Japanese high front short /i/ and long /ii/ were used
in the first syllable position (all words have a pitch accent on the first syllable):
biru ‘building’
k ado ‘corner’
toru ‘take’
biiru ‘beer’
k aado ‘card’
tooru ‘pass’
The target word was presented in a frame sentence:
sosite
(‘Next I said
to iimasu.
)
3.5.2. Subjects
Four speakers of L1 English and three advanced and three beginning Japanese speakers of
L2 English participated in Experiment 4 (English). All these speakers also participated in
Experiments 1 and 3. Three speakers of L1 Japanese and three advanced and four
beginning English speakers of L2 Japanese participated in Experiment 5 (Japanese). All
these speakers also participated in Experiment 2. Refer to Section 2.1.4. for the
information of the language backgrounds of L2 speakers. BE2, BE3, BE4, AE1, AE3 and
AE2 in Experiments 1 and 3 correspond to BE1, BE2, BE3, AE1, AE2 and AE3 in
Experiment 4, respectively. The seven speakers of L2 Japanese participated in both
Experiments 2 and 5 and were coded in the same way.
3.5.3. Procedure
Recording
For each experiment, sentences with target words were mixed with foil sentences.
Sentences in each reading of the list were pseudo-randomized in different orders. In the
recording session, PsyScope was used to present sentences. One sentence was displayed
on the computer screen at a time.
73
The subjects were given sufficient time to practice speech materials. They were asked
to read sentences without hesitations or pauses in the middle. They read the sentence list
10 times. The first reading was not analyzed.
Data were recorded in the recording booth of the UCLA phonetics lab for L1 English,
advanced L2 English, L2 Japanese groups, and in the recording room of Meiji Gakuin
University Information Center in Tokyo for L1 Japanese and beginning L2 English groups.
Measurements
The collected data were digitized with Kay Elemetrics’s CSL at a 10 kHz sampling rate.
Scicon’s PitchWorks was used to measure duration and frequencies of the first and second
formants (F1 and F2, henceforth). Frequencies of F1 and F2 are the physical correlates of
vowel height and backness, respectively. Tokens were not analyzed if:
•
there were hesitations or pauses in the middle of the sentence
•
words were mispronounced
Duration of the first vowel of each target word was measured on waveforms and wideband spectrograms. Formant frequencies were measured at the center of the first syllable,
using LPC analysis.
Statistic Analysis
Obtained values of duration and formant frequencies (F1 and F2) were analyzed, using
two-factor ANOVA and Scheffe’s post-hoc tests. The independent variables in the two
factor ANOVAs were vowel type and word pair type. The focus of Experiments 4 and 5 is
on the effect of vowel type on duration and vowel quality (tense vs. lax conditions in
Experiment 4, and long vs. short conditions in Experiment 5). The effect of word pair type
was included in the ANOVAs in order to control for the variance generated by this factor.
74
3.6. Results of Experiment 4 (English)
3.6.1. Duration
L1 English pattern
All four L1 English speakers showed longer duration means for tense than lax vowels. A
representative pattern is shown in Figure 3-2, in which the means and standard deviations
of tense /i/ vs. lax /I/ in NE1’s production are compared for each word pair.
Figure 3-2: Duration means and standard deviations of tense /i/ and lax /I/
for Speaker NE1 (L1 English)
tense
ms
NE1
150
mean duration
lax
*
*
100
*
*
bVd ( bead vs. bid )
dVp ( deep vs. dip )
kVn ( keen vs. kin )
pVt ( Pete vs. pit )
50
0
bVd
dVp
kVn
pVt
word pair type
(* = significantly different at a = 0.01)
The results of a series of ANOVAs showed that the effect of vowel type (tense vs. lax
vowels) on vowel duration was significant for the data of every L1 English speaker (p <
0.0001). None of the four L1 English speakers showed a significant interaction effect
between vowel type and word pair type. See Table 3-2 for ANOVA results for each
speaker (shaded cells indicate no significant effect). Also, according to the results of a
series of Scheffe’s post-hoc tests (α = 0.01), the difference in duration means in the tense
vs. lax vowels was statistically significant for all 4 tested word pairs for all four L1 English
speakers. In Figure 3-2, word pairs showing significant differences are marked by an
asterisk (α = 0.01).
75
Table 3-2: ANOVA results for duration data of L1 English in Experiment 4 (α = 0.01)
vowel
word pair
vowel*word pair
NE1
F(1, 51) = 52.53
p = <.0001
F(3, 61) = 28
p = <.0001
F(3, 61) = .76
p = .5241
NE2
F(1, 61) = 151.89
p = <.0001
F(1, 61) = 69.22
p = <.0001
F(1, 61) = 1.15
p = .3374
NE3
F(1, 64) = 89.51
p = <.0001
F(1, 64) = 56.49
p = <.0001
F(1, 61) = .577
p = .6351
NE4
F(1, 64) = 105.13
p = <.0001
F(1, 64) = 46.84
p = <.0001
F(1, 64) = 2.5
p = .0676
Additionally, duration ratios of tense to lax vowels were computed for each speaker by
dividing the duration value of each tense token by the one of the corresponding lax token
for each repetition of each word pair type. The four L1 English speakers showed a similar
pattern: the duration ratios of tense to lax vowels are about 1.3 (i.e., the tense /i/ is about
30% longer than the corresponding lax /I/). A representative pattern is shown in Figure 33.
Figure 3-3: Average durational ratio of English tense/lax vowels
for Speaker NE1 (L1 English)
2
duration ratio
NE1
1.5
1
bVd ( bead vs. bid )
dVp ( deep vs. dip )
kVn ( keen vs. kin )
pVt ( Pete vs. pit )
.5
0
bVd
dVp
kVn
word pair type
L2 English patterns
76
pVt
Duration ratios of English tense/lax vowels for L2 English data are presented in Figures 3-4
and 3-5.
Figure 3-4: Mean and standard deviation of duration ratio
of English tense/lax vowels for BE1, BE3 and AE1 (L2 English)
duration ratio (tense/lax)
3
BE1
2
2
1
1
0
3
bVd
BE3
3
dVp
kVn
pVt
0
bVd
dVp
kVn
AE1
2
bVd ( bead vs. bid )
dVp ( deep vs. dip )
kVn ( keen vs. kin )
pVt ( Pete vs. pit )
1
0
bVd
dVp
kVn
pVt
word pair type
77
pVt
Figure 3-5: Mean and standard deviation of duration ratio
of English tense/lax vowels for AE2, AE3 and BE2 (L2 English)
3
AE2
3
duration ratio (tense/lax)
2
2
1.1
1.0
1
0
3
AE3
1
bVd
dVp
kVn
0
pVt
bVd
dVp
kVn
pVt
BE2
2
bVd ( bead vs. bid )
dVp ( deep vs. dip )
kVn ( keen vs. kin )
pVt ( Pete vs. pit )
1
0
bVd
dVp
kVn
pVt
word pair type
Three types of ratio patterns were observed in the data of L2 English produced by Japanese
speakers.
•
Two beginning (BE1 and BE3) and one advanced (AE1) L2 speakers
showed durational ratios around or above 2.0, which is similar to the
durational ratio of Japanese long to short vowels (2.0~2.5, according to
Han 1962), as shown in Figure 3-4.
•
Two advanced L2 speakers (AE2 and AE3) approximated L1-English-like
ratios for some word pairs, but they showed overcompensation (i.e., no
78
significant durational difference between tense vs. lax) for one pair, as
shown in Figure 3-5.
•
A L1-English-like pattern was also produced by one beginning L2 English
speaker (BE2), as shown in Figure 3-5.
Additionally, for each speaker, obtained ratio values were pooled across all word pairs and
repetitions, and mean ratio and standard deviation were computed. Results are shown in
Figure 3-6. The horizontal lines in the figure delimit the range of the average ratios for the
four L1 English speakers. As this figure shows, the duration ratios of the four L1 English
speakers are clustered around 1.3. This result is consistent with the ratio of tense /i/ to lax
/I/, which we computed from the results of Lehiste & Peterson (1960) and Hillenbrand et
al. (1995) in Section 3.2.2.
AE1, BE1 and BE3 show average ratios larger than 2.0,
similar to Japanese-like ratios, as also shown in Figure 3-4. AE2 and AE3 approximate a
native-like ratio (about 1.3), but remember that they also overcompensate the duration
contrast in some contexts, as shown in Figure 3-5.
Figure 3-6: Average duration ratios of English tense/lax vowels
3
L1 English
L2 English
2
1.5
<–>
duration ratio
2.5
1
.5
0
NE1
NE2
NE3 NE4
AE1
AE2
79
AE3
BE1
BE2
BE3
L1 English
range
These results show two points regarding the effect of the transfer of L1 Japanese
durational characteristics. First, we found the expected effect of negative transfer from L1
Japanese, in that three L2 English speakers (AE1, BE1 and BE3) showed Japanese-like
durational ratios. Second, it is possible to learn to reduce the Japanese-like duration
contrast, as shown by the fact that AE2, AE3 and BE2 successfully produced nativeEnglish-like ratios.
There is no systematic correlation between L2 proficiency and the duration ratio of L2
English tense /i/ / lax /I/, given that one advanced L2 English speaker (AE1) showed a
Japanese-like ratio while one beginning speaker (BE2) showed a native-like ratio.
3.6.2. Vowel quality
In order to analyze vowel quality, we measured frequencies for F1 and F2.
Obtained
values were plotted, using UCLA Phonetics Lab’s Plot Formants software.
In each
speaker’s plot, the mean and two standard deviations of F1 and F2 frequencies are shown
by the position of the phonetic symbol of each vowel category and an ellipsis circling the
sound symbol, respectively.
L1 English patterns
Results of the four L1 English speakers are presented in Figure 3-7. The four L1 English
speakers showed very consistent patterns: tense /i/ is higher and more front than lax /I/.
Also, there was no ellipsis overlap between the two vowel types, indicating that in L1
English tense /i/ and lax /I/ are significantly distinguished by vowel quality. This confirms
the findings of previous studies.
80
Figure 3-7: /i/ and /I/ in the vowel space of L1 English speakers
F1 frequency (Hz)
NE1
NE2
NE3
NE4
[i] = tense; [I] = lax
F2 frequency (Hz)
L2 English patterns
As opposed to the L1 English speakers, the three beginning Japanese speakers of L2
English showed no clear separation of tense /i/ and lax /I/, with a large overlap of the
ellipses of the two categories, as shown in Figure 3-8.
81
Figure 3-8: /i/ and /I/ in the vowel space of beginning L2 English speakers
F1 frequency (Hz)
BE1
BE2
[i] = tense; [I] = lax
BE3
F2 frequency (Hz)
The three advanced speakers of L2 English showed a clear separation between the means of
tense /i/ and lax /I/, with still some overlap of ellipses, as shown in Figure 3-9. This
indicates that the advanced speakers of L2 English could distinguish the two vowel
categories in terms of vowel quality, but not as consistently as L1 English speakers.
Notice that the pattern of advanced L2 English is somewhere between L1 English and
beginning L2 English patterns.
82
Figure 3-9: /i/ and /I/ in the vowel space of advanced L2 English speakers
F1 frequency (Hz)
AE1
AE2
[i] = tense; [I] = lax
AE3
F2 frequency (Hz)
Euclidean distance data
The formant plots of individual speakers show three distinctive patterns, corresponding to
the three speaker groups in terms of spectral separation between tense /i/ and lax /I/. In
order to quantify degrees of spectral separation, I computed the Euclidean distance between
tense and lax tokens of each minimal pair for each repetition of each speaker.
The
computation method is schematized in Figure 3-10. First, F1 and F2 differences between
the tense token T and the lax token L, i.e., F1 and F2 distances between the two tokens,
are computed (a and b in Figure 3-10); then the Euclidean distance between the tense and
lax tokens (c in Figure 3-10) was computed for each repetition of each minimal pair (e.g.,
83
keen for /i/ vs. kin for /I/). For each speaker, the mean and standard deviation of the
Euclidean distance were computed.
.b
T
c
.aL
a = F1 of T - F1 of L
b = F2 of T - F2 of L
F1 frequency (Hz)
Figure 3-10: Euclidean distance (c) between the tense /i/ token T and the lax /I/ token L
F2 frequency (Hz)
c = a2 + b2
The results are summarized in Figure 3-11. In this figure, greater distances mean that /i/
and /I/ are further separated from each other, indicating that the two vowel categories are
differentiated by vowel quality to a larger extent. The three speaker groups are significantly
differentiated by the magnitudes of Euclidean distances: L1 English > advanced L2 English
> beginning L2 English. This indicates that more experienced Japanese speakers of L2
English could approximate native-English-like spectral separation between tense /i/ and lax
/I/ more reliably. This indicates a positive correlation between L2 English proficiency and
the vowel quality contrast between tense /i/ and lax /I/.
The results show two points regarding the effect of transfer.
First, we found the
expected effect of negative transfer from L1 Japanese in that beginning Japanese learners of
English showed no spectral separation between English tense /i/ and lax /I/. Second, there
is a positive effect of learning, given that advanced Japanese learners showed more nativelike separation between the two vowel categories (even though their spectral separation is
not as stable as the separation produced by L1 English speakers).
84
Figure 3-11: Euclidean distance between English tense /i/ and lax /I/
800
Euclidean distance mean
L1 English
Advanced
L2 English
Beginning
L2 English
600
L1 Engish
range
400
200
L1 Japanese
range
0
NE1 NE2 NE3 NE4
AE1 AE2 AE3
BE1 BE2 BE3
3.7. Results of Experiment 5 (Japanese)
3.7.1. Duration
L1 Japanese pattern
All four L1 Japanese speakers showed longer duration means for long vowels. A
representative pattern is shown in Figure 3-12, where the means of long and short vowels
in NJ1’s production are compared for each word pair. The results of two-way ANOVAs
showed that the effect of the vowel type (long vs. short vowels) on vowel duration was
significant for the data of every L1 Japanese speaker (p < 0.0001). None of the four L1
Japanese speakers showed any significant interaction effects of the vowel type and the
word pair type. See Table 3-3 for ANOVA results for each speaker (shaded cells indicate
no significant effect). Also, according to the results of a series of Scheffe’s post-hoc tests
(α = 0.01), the difference in duration means in the long vs. short condition was statistically
significant for all three word pairs for all four L1 Japanese speakers. In Figure 3-12, word
pairs showing significant differences are marked by an asterisk (α = 0.01).
85
Table 3-3: ANOVA results for duration data of L1 Japanese in Experiment 5 (α = 0.01)
vowel
word pair
vowel*word pair
NJ1
F(1, 43) = 689.19
p = <.0001
F(1, 43) = 21.45
p = <.0001
F(1, 43) = .35
p = .7070
NJ2
F(1, 45) = 805.79
p = <.0001
F(1, 45) = 15.04
p = <.0001
F(1, 45) = 3.76
p = .0309
NJ3
F(1, 48) = 1976.1
p = <.0001
F(1, 48) = 19.82
p = <.0001
F(1, 48) = 4.5
p = .0156
Figure 3-12: Duration means and standard deviations of short and long vowels
for Speaker NJ1 (L1 Japanese)
short
ms
long
mean duration
NJ1
300
200
*
*
*
100
biru (biru vs. biiru)
kado (kado vs. kaado)
toru (toru vs. tooru)
0
biru
kado
t oru
(* = significantly different at α = 0.01)
Additionally, duration ratios of long to short vowels were computed for each speaker’s
data by dividing the duration values of each long token by the ones of the corresponding
short token for each repetition of each word pair type. The three L1 Japanese speakers
showed a similar pattern: the duration ratios of long to short vowels are about 2.0 to 2.5
(i.e., long vowels are 2.0 to 2.5 times as long as the corresponding short vowels in L1
Japanese production). A representative pattern is shown in Figure 3-13.
86
Figure 3-13: Duration ratios of Japanese long/short vowels
for Speaker NJ1 (L1 Japanese)
duration ratio
3
NJ1
2
1
0
biru
kado
t oru
word pair type
L2 Japanese patterns
Overall, the seven L2 Japanese speakers also produced Japanese long vowels with
significantly longer duration than the corresponding short vowels for all 3 word pairs.
This pattern is observed in Figure 3-14, where the average duration ratios of Japanese long
to short vowels are shown for all participants. The horizontal lines in the figure delimit the
range of duration ratios produced by the three L1 Japanese speakers (approximately 2.0 to
2.3). A ratio of about 1.0 indicates that there is no durational difference between long and
short vowels. This figure shows that the durational ratio of every L2 Japanese speaker is
greater than 1.5, which means that long vowels were at least 1.5 times longer than the
corresponding short vowels in L2 Japanese. This indicates that all seven L2 Japanese
speakers showed duration ratios of Japanese long/short vowels larger than their L1 ratio of
English tense/lax vowels (approximately 1.3).
87
Figure 3-14: Average duration ratios of Japanese long/short vowels
L1 Japanese
L2 Japanese
3
<–>
duration ratio
4
2
L1 Japanese
range
1
0
NJ1
NJ2
NJ3
AJ1
AJ2
AJ3
BJ1
BJ2
BJ3
BJ4
By comparing L1 and L2 ratios, we can observe patterns in L2 Japanese:
•
AJ1 and BJ1 showed durational ratios within the L1 Japanese range, i.e.,
the two speakers approximated the L1 Japanese pattern. BJ4 showed a ratio
closer to the L1 Japanese range with a large standard deviation, which
indicates that his approximation of the L1 Japanese pattern was not stable.
•
BJ3 showed a duration ratio which was smaller than L1 Japanese ratios but
still larger than L1 English ratios.
•
Two advanced (AJ2, AJ3) and one beginning (BJ2) speakers of L2
Japanese showed ratios much greater than the range of L1 Japanese ratios.
This means that these three speakers exaggerated the durational contrast
between Japanese long and short vowels (overcompensation effect).
These results show two points regarding the effect of the transfer of the durational
characteristics of L1 English vowels. First, we did not find the expected effect of negative
transfer from L1 English, given that all L2 Japanese speakers were able to produce the
durational contrast between Japanese long and short vowels greater than their L1 durational
88
contrast between English tense and lax vowels (i.e., about 1.3). However, some speakers
exaggerated the contrast. Second, we found no major difference between advanced and
beginning L2 Japanese.
3.7.2. Vowel quality
L1 Japanese pattern
The formant frequencies of Japanese long /ii/ and short /i/ for all the three L1 Japanese
speakers are shown in Figure 3-15. The results showed a consistent L1 Japanese pattern:
the means of long /ii/ and short /i/ in L1 Japanese were close to each other with no ellipse
separation. This is very different from the L1 English pattern, which, as we saw in the
Experiment 4, is characterized by a clear spectral separation between tense /i/ and lax /I/.
This difference between the two languages clearly emerges from the comparison of the
formant plots of NJ1 and NE1 in Figure 3-16 (both speakers are males). The absence of a
clear separation between long /ii/ and short /i/ in NJ1’s vowel space indicates that vowel
quality does not play any role in phonetically distinguishing the two vowel categories (at
least in the high front region), unlike the case of L1 English.
Another important characteristic of L1 Japanese is illustrated by the comparison of
NJ1’s L1 Japanese and NE1’s L1 English patterns in Figure 3-16. Both Japanese long /ii/
and short /i/ in L1 Japanese production are located in the spectral region of English tense /i/.
This means that the vowel quality of Japanese /ii/ and /i/ is similar to the quality of English
tense /i/.
89
Figure 3-15: /i/ and /ii/ in the vowel space of L1 Japanese speakers
F1 frequency (Hz)
NJ1
NJ2
NJ3
[i] = long
[I] = short
F2 frequency (Hz)
Figure 3-16: Spectral contrast in L1 Japanese vs. L1 English in the high front region
F1 frequency (Hz)
NJ1 (Japanese)
NE1 (English)
[i] = long; [I] = short
[i] = tense; [I] = lax
F2 frequency (Hz)
90
L2 Japanese patterns
The formant frequencies of Japanese long /ii/ and short /i/ were plotted for advanced and
beginning speakers of L2 English in Figures 3-17 and 3-18, respectively. In Figure 3-17,
a pattern representative of L1 English is also shown.
Figure 3-17: /i/ and /ii/ in the vowel space of
AJ1, AJ2 and AJ3 (advanced L2 Japanese) and NE1 (L1 English)
F1 frequency (Hz)
AJ1
AJ2
[i] = long; [I] = short
AJ3
c.f. NE1 (L1 English)
[i] = tense; [I] = lax
F2 frequency (Hz)
In Figure 3-17, we find a consistent pattern across the three advanced speakers of L2
Japanese: long /ii/ and short /i/ are tightly clustered in the spectral region of L1 English
tense /i/. This indicates that there is no significant difference between long /ii/ and short /i/
in vowel quality in the production of advanced L2 Japanese. The comparison of advanced
91
L2 Japanese (AJ1–3) and L1 English (NE1) shows that all three advanced speakers of L2
Japanese selected the region of English tense /i/ and distinguished Japanese /ii/ and /i/ by
varying duration.
F1 frequency (Hz)
Figure 3-18: /i/ and /ii/ in the vowel space of beginning L2 Japanese speakers
BJ1
BJ2
BJ3
BJ4
[i] = long; [I] = short
F2 frequency (Hz)
Similarly, the spectral region of English tense /i/ was selected in the production of both
Japanese long /ii/ and short /i/ by the four beginning speakers of L2 Japanese, as shown by
the comparison of Figures 3-17 and 3-18. A difference between beginning and advanced
L2 Japanese is that the areas of the ellipses are larger in beginning L2 Japanese, indicating
that the vowel quality of /i/ and /ii/ in beginning L2 Japanese varied to more extent.
92
Euclidean distance data
To quantify these comparisons, I computed the Euclidean distance between Japanese long
and short tokens for the minimal pair of /ii/ vs. /i/ for each speaker following the same
method used to compute the Euclidean distance between English tense and lax tokens in
Experiment 4 (see Figure 3-10). Average Euclidean distances are shown for all speakers in
Figure 3-19.
The comparison of individual plots across L1 Japanese, advanced L2 Japanese and
beginning L2 Japanese has already shown that none of the seven L2 Japanese speakers
produced a quality contrast between Japanese long /ii/ and short /i/. This L2 Japanese
pattern is also confirmed by Figure 3-19: the average distance for every L2 Japanese
speaker is much smaller than the distance in the range of L1 English, and it is located
within the L1 Japanese range. This suggests that all seven speakers of L2 Japanese could
approximate the L1 Japanese pattern. The comparison of individual plots in Figures 3-14
and 3-15 has shown that the vowel quality of /i/–/ii/ in beginning L2 Japanese varied more
than the one in advanced L2 Japanese. This difference is also observed in Figure 3-16: the
average Euclidean distances of beginning L2 Japanese tend to have taller error bars.
Figure 3-19: Average Euclidean distance between Japanese long /ii/ to short /i/
800
Euclidean distance mean
L1 Japanese
Advanced
L2 Japanese
Beginning
L2 Japanese
600
L1 Engish
range
400
200
L1 Japanese
range
0
NJ1 NJ2 NJ3
AJ1 AJ2 AJ3
93
BJ1 BJ2 BJ3 BJ4
The results suggest three general patterns for L2 Japanese–L1 English. First, all seven
L2 Japanese speakers could approximate the L1 Japanese pattern, given that there was no
significant difference between long /ii/ and short /i/ in vowel quality. Second, native
English speakers learning Japanese do not replace Japanese short /i/ with English lax /I/.
They rather seem to choose the spectral region of tense /i/ and produce Japanese long and
short high front vowels within that region. Finally, there was no evidence for the effect of
negative transfer of L1 English spectral characteristics in L2 Japanese produced by English
speakers.
3.8. Discussion of Experiments 4 and 5
3.8.1. Vowel contrast in L1 English vs. L1 Japanese
The L1 patterns emerging from the results of Experiments 4 and 5 are summarized in Table
3-4.
Table 3-4: Summary of L1 English and L1 Japanese patterns
observed in Experiments 4 and 5
L1 English
L1 Japanese
duration
contrast
YES: tense > lax
YES: long > short
(about 30% longer)
(about 2.0 ~ 2.5 times longer)
quality
contrast
YES
NO
tense is more peripheral than lax
(clear separation between tense
and lax with no spectral overlap)
long /ii/ and short /i/ are produced in
the spectral region close to L1 English
tense /i/
Japanese long and short vowels in the same spectral region are differentiated only by
duration. Both long /ii/ and short /i/ are produced in the spectral region of English tense /i/.
In contrast, English tense and lax vowels are differentiated by both duration and vowel
quality. An important difference between the two L1 types is that the durational contrast of
Japanese long to short vowels (about 2.0 to 2.5) is greater in magnitude than that of
94
English tense to lax vowels (about 1.3). These patterns in Table 3-4 confirm the findings
of previous studies, which were summarized in Table 3-1.
3.8.2. Duration contrast in L2 English and L2 Japanese vowels
The patterns of duration contrast in L2 English and L2 Japanese are summarized in Table 35 and discussed in the sections that follow it.
Table 3-5: Summary of duration contrast in
L2 English and L2 Japanese vowels observed in Experiments 4 and 5
L2 English–L1 Japanese
L2 Japanese–L1 English
negative transfer of the L1
Japanese contrast (greater
duration contrast)
duration
contrast
however, possible to learn the L1
English duration contrast by
weakening the L1 Japanese
duration contrast (positive
learning effect)
tend to overweaken the duration
contrast (overcompensation
effect)
able to produce the L1 Japaneselike pattern by enhancing
phonetic duration contrast in L1
English
however, tend to exaggerate the
duration contrast by overlengthening a long vowel
(overcompensation effect)
no systematic correlation with L2
Japanese proficiency
no systematic correlation with L2
English proficiency
Negative transfer of the L1 duration contrast
In Experiment 4, we found effect of negative transfer of the L1 duration pattern in L2
English produced by Japanese speakers: some Japanese speakers of L2 English produced a
duration ratio of English tense/ lax vowels as large as the duration ratio of L1 Japanese
long/short vowels. However, we have found no evidence of the negative transfer of the L1
English pattern in L2 Japanese produced by native English speakers: none of the L2
Japanese speakers produced a duration ratio of Japanese long/short vowels as small as the
ratio of L1 English tense/lax vowels.
95
There is an asymmetry between the two L2 types in terms of the effect of the negative
transfer of the L1 duration pattern. While there was some evidence for the negative transfer
of the L1 Japanese pattern in L2 English, we did not find any evidence of the negative
transfer of the L1 English pattern in L2 Japanese. This finding suggests that weakening the
prominent phonemic duration contrast of L1 vowels (the case of L2 English produced by
native Japanese speakers) may be more challenging than enhancing the L1 non-phonemic
duration contrast (the case of L2 Japanese produced by native English speakers).
Another possibly relevant factor is that the Japanese kana alphabets provide English
learners of Japanese with visual cues helpful to acquire the distribution of phonemic length
among Japanese vowels. Kana letters represent moras; thus, the moraic segments /Q/, /R/
and /N/ are represented by separate letters. The short and long vowels that were
investigated in Experiment 5 are differentiated by the number of letters in the kana system.
A significant effect of kana literacy acquisition on speech segmentation unit awareness by
Japanese young children was found by Inagaki, Hatano and Otake (2000). Their
experiments tested the segmentation of words containing CVN, CVQ, CVV and CV by 4to 6- years old children. Results indicated that “the children’s conscious segmentation of
words... developed from being a mixture of syllable- and mora-based to being
predominantly mora-based as they learned to read kana letters” (Inagaki et al. 2000, p. 70).
A similar effect on the segmentation of Japanese words by native English speakers
participating in Experiment 5 is possible, given the fact that they all acquired kana letters.
This hypothesis needs to be tested in further studies, for example by comparing English
speakers with kana literacy with English speakers without kana literacy (but equally fluent
in Japanese) in terms of their segmentation of Japanese words.
96
Positive learning effect
In Section 3.4, the following research questions were asked regarding the learning of the
duration characteristics of a target language, with respect to L2 English and L2 Japanese:
•
Do native Japanese speakers learn to weaken their prominent phonemic
duration contrast in the production of English tense and lax vowels?
•
Do native English speakers learn to produce a native-like large duration
contrast between Japanese short and long vowels?
The results of Experiments 4 and 5 have answered the questions, showing evidence for
positive learning effect in both L2 types: it is possible for native Japanese speakers to learn
to approximate the native English contrast by weakening the prominent contrast in their L2
English production, and vice versa for L2 Japanese production by native English speakers.
Overcompensation effects
Two advanced Japanese speakers of L2 English (AE2 and AE3) successfully avoided
producing the prominent duration contrast of L1 in L2 English production. However, in
some contexts, they overweakened the duration contrast and eliminated the durational
difference between English tense and lax vowels. This is an example of an
overcompensation effect.
Interestingly, the opposite direction of overcompensation was observed in L2 Japanese
produced by native English speakers. Three speakers of L2 Japanese (AJ2, AJ3 and BJ2)
consistently exaggerated the durational difference between Japanese long and short vowels.
The means and standard deviations of Japanese long and short vowels are plotted for these
three L2 Japanese speakers and one L1 Japanese speaker (NJ1) in Figure 3-20. The
comparison of their patterns shows that the exaggerated contrast is due to overlengthening
of the long vowels.
97
Figure 3-20: Mean and standard deviation of Japanese long and short vowels
for AJ2, AJ3, BJ2 (L2 Japanese) and NJ1 (L1 Japanese)
mean duration
ms 400
400
AJ2
300
300
200
200
100
100
0
biru
kado
t oru
400
0
AJ3
biru
kado
t oru
400
BJ2
NJ1 (L1 Japanese)
300
300
200
200
100
100
0
biru
kado
t oru
word pair type
0
biru
kado
short
t oru
long
On the other hand, there is no single factor contributing to the aforementioned
overweakening of the L2 English duration contrast in the production of AE2 and AE3.
The individual plots of these two L2 English speakers and a plot of a native English
speaker’s data are shown in Figure 3-21. AE2’s durational contrast between tense /i/ and
lax /I/ disappeared in the Pete–pit pair. It is not possible to tell whether this pattern is due
to a shortening of tense /i/ or to a lengthening of lax /I/. This is a case of genuine
neutralization of the duration contrast between tense and lax vowels. On the other hand,
the absence of duration contrast in AE3’s production of the bead–bid pair is due to an
excessive lengthening both tense /i/ and lax /I/. We also notice that the overall duration of
98
both tense and lax durations is much greater in the production of AE2 and AE3 than the
production of NE1. However, it is difficult to make a connection between this L2 pattern
and the aforementioned overweakening of the L2 English duration contrast shown by AE2
and AE3.
Figure 3-21: Mean and standard deviation of Japanese long and short vowels
for AE2, AE3 (advanced L2 English) and NE1 (L1 English)
ms
250
AE2
200
mean duration
150
250
AE3
200
*
*
*
100
100
50
50
0
*
*
150
*
0
bVd
dVp
kVn
pVt
bVd
250
dVp
kVn
tense
NE1 (L1 English)
pVt
lax
200
150
*
*
*
100
*
bVd ( bead vs. bid )
dVp ( deep vs. dip )
kVn ( keen vs. kin )
pVt ( Pete vs. pit )
50
0
bVd
dVp
kVn
pVt
word pair type
(* = significantly different at a = 0.01)
These speakers of L2 English and L2 Japanese avoided importing L1 duration patterns
in their production of L2 vowels. However, the observed cases of overcompensation
effect in both L2 types indicate how challenging it is to master the phonetic habits of the
target language.
99
No correlation with L2 English proficiency
The data on L2 English in Experiment 4 showed no systematic correlation between duration
patterns and L2 English proficiency. That was also the case for L2 Japanese in Experiment
5. These results suggest that there is no systematic developmental pattern in the acquisition
of the L2 duration contrast in both L2 types.
3.8.3. Quality contrast in L2 English and L2 Japanese vowels
The patterns of vowel quality contrast in L2 English and L2 Japanese are summarized in the
following table and discussed in the sections that follow it.
Table 3-6: Summary of quality contrast in
L2 Japanese and L2 English vowels observed in Experiments 4 and 5
L2 English–L1 Japanese
negative transfer of the L1
Japanese pattern (no lax
category)
quality
contrast
able to learn to distinguish tense
/i/ vs. lax /I/ by quality, but still
not with a native-like degree of
spectral separation
Positive correlation with L2
English proficiency
L2 Japanese–L1 English
able to produce the L1 Japanese
pattern (no strong evidence for
negative transfer of the L1
English pattern)
already have tense and lax
categories in L1 English
–> choose the tense [i] area and
just distinguish Japanese
short and long vowels by
duration
Transfer effect of L1 quality characteristics
For vowel quality, the results of Experiments 4 and 5 revealed an asymmetrical pattern
between the two L2 types in terms of the transfer effect of the spectral characteristics of the
L1 vowel contrast. In Experiment 4, we found an effect of negative transfer of the L1
Japanese pattern in the production of English tense /i/ and lax /I/ by native Japanese
speakers: the three beginning Japanese learners of English produced both English tense /i/
and lax /I/ in the spectral region of Japanese /i/, which is close to English tense /i/, and
100
differentiated the two categories only by duration. In contrast, we found no evidence of the
negative transfer of the L1 English pattern in L2 Japanese produced by native English
speakers. Both beginning and advanced speakers of L2 Japanese in Experiment 5 were
able to produce the L1 Japanese pattern, i.e., to only produce the target vowels within the
spectral region of English tense /i/, which is close to Japanese /i/-/ii/, and distinguish
Japanese long and short vowels by duration.
The presence vs. absence of negative transfer of L1 patterns in L2 English and L2
Japanese, respectively, can be explained by the following difference between L1 Japanese
and L1 English. In the high front region of the vowel space, there is no lax category in L1
Japanese, i.e., the region of English lax /I/ is never used. Thus, native Japanese speakers
need to learn to use the English lax region and create a new vowel category in order to
produce the quality contrast of English /i/–/I/. On the other hand, the spectral region of
Japanese /i/-/ii/ is already present in the L1 English system (i.e., tense /i/), so there is no
need to create a new vowel category in the production of L2 Japanese by native English
speakers.
It would have been possible that English tense /i/ and lax /I/ were mapped to Japanese
long /ii/ and short /i/, respectively, in the production of L2 Japanese by native English
speakers. However, in the data of Experiment 5, we did not find any case of this possible
negative transfer pattern. Native English speakers map English /i/ into both Japanese long
/ii/ and short /i/ without using the region of English lax /I/. This pattern could be due to the
positive transfer of the high sensitivity of native English speakers to spectral information.
As mentioned in Section 3.3.1, Bohn and Flege (1990) found that native English speakers
use spectral information as a dominant cue in the perceptual discrimination of the English
tense-lax contrast in the front region. It is likely that, thanks to their high L1 spectral
sensitivity, native English speakers realize that Japanese long and short vowels produced
101
by L1 Japanese speakers are close to English tense /i/, and then produce the two vowel
types as long and short versions of English tense /i/, respectively.
Learning effect
In Section 3.3, the following research questions were asked regarding the learning of the
quality characteristics of the target language, with respect to L2 English and L2 Japanese.
•
Do native Japanese speakers learn to produce a native-like significant
contrast of tense-lax vowel quality?
•
Do native English speakers learn to suppress a significant quality contrast in
the production of Japanese short and long vowels?
As just discussed, opposite tasks are involved in approximating the pattern of the target
language in the two L2 types. Native Japanese speakers learning English need to create a
new vowel category in the spectral region of English lax /I/, while native English speakers
learning Japanese need to avoid using the spectral region of English lax /I/ and produce
Japanese long and short vowels in the region of English tense /i/. The results of
Experiment 4 show evidence for the negative transfer of the L1 Japanese pattern into L2
English and also a positive learning effect in the patterns of advanced L2 English. In the
L2 Japanese data, we do not find a single case in which a L2 Japanese speaker produced a
Japanese vowel in the English lax /I/ region. Furthermore, comparing individual formant
plots between advanced L2 English and L2 Japanese in Figures 3-8 and 3-14, we find that
advanced speakers of L2 Japanese produce the native-like quality contrast more reliably
than advanced speakers of L2 English, who can differentiate English tense /i/ and lax /I/ by
quality, but still with some spectral overlap between tense /i/ and lax /I/.
102
These results lead to two general conclusions. First, in both L2 types, it is possible to
learn the pattern of the target language. Second, however, it is more challenging for native
Japanese speakers to produce a native-like quality contrast in the production of English
tense and lax vowels than for native English speakers to avoid producing a significant
quality contrast in the production of Japanese short and long vowels. Presumably, this is
due to the fact that it is harder to create a new vowel category (lax /I/ in the case of L2
English–L1 Japanese) than it is to use a category already available in L1 in a different
context (as in the case of L2 Japanese–L1 English). This hypothesis should be tested in
further studies by comparing the two L2 types in the production of vowel categories in
spectral regions other than the high front region considered in this chapter.
3.9. Summary of Experiments 4 and 5
In Experiments 4 and 5, we investigated how the patterns of duration and vowel quality in
L1 vowel production transfer to vowel contrast in L2 English produced by Japanese
speakers and L2 Japanese produced by English speakers, respectively. The findings of
earlier studies were confirmed by our L1 data. L1 Japanese long and short vowels in the
same spectral region are consistently distinguished only by duration, while L1 English
tense and lax vowels are distinguished by both duration and vowel quality. The two
languages differ in terms of the magnitude of the duration contrast: the ratio of Japanese
long to short vowels is significantly greater than the ratio of English tense to lax vowels.
The duration results show that it is possible to learn the L2 pattern by learning to
suppress the L1 duration pattern in the production of L2. However, L2 learners tend to
exaggerate the duration pattern of their L2. This is the case of both L2 English and L2
Japanese. The absence vs. presence of negative transfer from L1 in L2 Japanese and L2
English, respectively, suggests that it may be more challenging to weaken a prominent L1
duration contrast (the case of L2 English–L1 Japanese) than enhance a L1 contrast in order
103
to approximate the duration contrast of L2 (the case of L2 Japanese–L1 English). This
difference between L2 English–L1 Japanese and L2 Japanese–L1 English may reflect the
difference between L1 Japanese and L1 English in terms of the phonemic status of the
duration correlate: i.e., in L2 production, a duration contrast which is phonemic in L1 is
harder to change or manipulate than a contrast which is not phonemic in L1. In other
words, phonemic L1 duration patterns are more likely to negatively transfer to L2
production than duration patterns that are only phonetic.
The analysis of vowel quality showed strong evidence of the negative transfer of the L1
Japanese pattern in L2 English, but no evidence for the negative transfer of the L1 English
pattern in L2 Japanese. This asymmetry can be explained in terms of a difference between
the two types of L1 in the phonemic vowel system: L1 Japanese speakers learning English
have to develop a new phonetic/phonological category, i.e., English lax /I/, and this is
harder than the task faced by L1 English speakers learning Japanese, who already possess
a category equivalent to Japanese /i/-/ii/, i.e., English tense /i/, so that they simply have to
learn to produce long and short versions of this category.
104
Chapter 4:
Temporal Organization Across Syllables in
L2 English and L2 Japanese
4.1. English stress vs. Japanese mora timings
4.1.1. Stress-foot and mora as basic timing units
As mentioned in Section 1.4.3, languages have been traditionally classified into different
timing categories, on the basis of the notion that temporal organization of speech is based
on isochronous units of timing (for a summary, see Beckman 1992; Tajima 1998). English
is classified as a language in which the fundamental unit of timing is the stress foot, i.e., a
stress-timed language (Pike 1945; Abercrombie 1967). French is classified as a language
in which the syllable is used as the basic timing unit, i.e., a syllable-timed language.
Finally, Japanese is classified as a mora-timed language having the mora as a timing unit
(see Jinbo 1927, Trubetzkoy 1939, Block 1950, and Hattori 1960 for early proposals of
mora isochrony; Warner and Arai [submitted] for a review).
In the theory of isochrony, the duration of each occurrence of the basic timing unit is
assumed to be equal. In the case of English stress-timing, the duration of the interval
between successive stresses (i.e., the interstress interval) is expected to be the same no
matter how many other factors may vary (e.g., the number of unstressed syllables, the
segmental composition of the syllables forming the foot). If this theory is correct, there
should be, for example, no difference in the duration of interstress intervals across the
following three sentences (the relevant stressed syllables are marked by acute accents):
Send Ánn hóme.
Send Ánna hóme.
Send ánimals hóme.
105
However, “[m]any investigators, beginning with Class (1939), have measured interstress
intervals in English, and all have shown that they are objectively longer as they contain
more and more syllables (see Lehiste 1977; Kawasaki 1983 for a review)” (Dauer 1983, p.
52).
A number of studies were also conducted to test the theory of isochrony with respect to
Japanese mora-timing.
“Because of the effects of inherent segmental duration and
differences in moraic structure (non-CV morae), it is clear that morae are not completely
isochronous n Japanese”, as pointed by Warner and Arai (submitted, pp. 4-5). In an early
experimental study of mora-timing, Han (1962) argued that the mora is a unit of duration in
Japanese, and that all morae are of approximately (but not exactly) equal length, and that
this is achieved through durational compensation within a mora. The hypotheses proposed
by Han (1962), the nearly equal duration of different morae and compensation, are the
focus of most early work on mora-timing13 . However, Warner and Arai, on the basis of a
review of the major studies on this topic, conclude that the experimental results are too
inconsistent to support the claim that moras have approximately the same length.
4.1.2. Factors characterizing different timing types
It has been proposed in later studies that different timing types are characterized not by
different types of isochronous units, but rather by other properties of speech.
For
example, Dauer (1983) suggested that crosslinguistic differences in timing patterns are
affected by the interaction of factors, such as the phonetic realization of lexical accent,
13
After many counterexamples against the idea of inter-mora compensation as the basis of Japanese moraisochrony were presented, a new definition of mora-timing was proposed by Port et al. (1987). Port et al.
define mora-timing not as a tendency of all moras toward equal duration, but rather as predictability of word
duration from the number of moras in the word. “Under this new definition, if one mora is shorter than
average, the others are expected to be longer in order to maintain constant word duration, not shorter in order
to match it... This version of the hypothesis predicts negative correlation both within and across mora
boundaries” (Warner & Arai submitted for a review, p.14). A few experimental studies were conducted to
test this hypothesis. Some studies (e.g., Bradlow et al. 1995; Sato 1995, 1996, 1998) presented
106
degrees of vowel reduction or syllable structure. These factors are helpful in understanding
differences between English and Japanese timing patterns.
As discussed in Chapter 2 of the present study, English and Japanese greatly differ in
terms of how duration is treated in the realization of lexical accent (see Section 2.1.2. for a
review).
In English, stressed syllables are longer than unstressed syllables, and
consequently the durational patterns of utterances are strongly affected by the distribution
of lexical stress. A stressed syllable is subject to lengthening, so that an unstressed syllable
tends to be shorter in duration. On the other hand, in Japanese, lexical accent does not
affect the duration of syllables. Consequently, there is no significant durational difference
between accented and unaccented syllables in Japanese, and the durational patterns of
utterances are mainly dependent on the distribution of phonemic short and long segments.
In other words, while in English lexical accent properties and rhythmic organization are
closely related, in Japanese, these two aspects are independent, and rhythmic organization
depends largely on the phonemic distribution of short and long segments.
In stress-timed languages, more complex syllable structures tend to be found in
stressed syllables, while simple structures (CV) occur in unstressed syllables.
This
difference results in a higher average number of segments in the stressed syllables, which
contributes to the higher average duration of the stressed syllables (see Fant, Kruckenberg
& Nord 1991). The inventory of syllable types is more limited in syllable-timed languages
like French or Spanish, and even more limited in mora-timed languages like Japanese. The
types of syllable structure frequently occurring in recorded speech in English (stress-time),
Spanish (syllable-time) and Japanese (mora-time) are presented in Table 4-1, which shows
percentages of CV and V syllable types occurring in these languages. Over half of the
syllables in Spanish and Japanese have a simple CV structure, while English shows a
experimental support for the claim of Port et al., while other studies did not (e.g., Campbell & Sagisaka
1991; Campbell 1991; Han 1994).
107
wider distribution among different types of syllables. The proportion of CV is higher in
Japanese than in Spanish.
Table 4-1: Percentages of CV and V syllable types in English, Spanish and Japanese14
English (Dauer 1983)
CV
34%
V
8%
Spanish (Dauer 1983)
CV
58%
V
6%
Japanese (Otake 1990)
CV
73%
V
10%
Dauer (1983) observed that syllables with more complicated structure tend to be
stressed in a stressed-timed language. “[T]here is a strong tendency for “heavy” syllables
(those containing many segments) to be stressed and “light” syllables (those containing few
segments) to be unstressed.
That is, syllable structure and stress are more likely to
reinforce each other in a stress-timed than in a syllable-timed language” (Dauer 1983, pp.
55-56). Thus, heaviness makes stressed syllable longer than unstressed syllables.
Dauer also points out that, besides syllable structure, the segmental composition of
syllables reinforces the difference between stressed and unstressed syllables in English. In
the English text analyzed by Dauer (1983), most unstressed CV syllables (92%) were
composed of a consonant plus /I/, /´/. and /„/, whereas most stressed CVs (83%) had /O/,
/ou/, /E/ or /eI/ (all vowels which have longer inherent duration) as their nucleus.
In
Japanese, however, there is no difference in terms of segmental composition between
accented and unaccented syllables.
4.2. Linguistic factors investigated
The temporal organization of speech is also affected by linguistic factors above the
segmental and syllabic levels. In this chapter, we focus on the effects of two linguistic
factors of this sort: 1) effect of parts of speech; 2) effect of the lapse constraint and
14
This table is adapted from Table 6 in Otake (1990).
108
culminativity requirement. The first and second factors will be examined in Experiments 6
and 7, respectively.
4.2.1.
Effect of parts of speech on temporal organization
An additional difference between English and Japanese temporal organization concerns
their morphosyntactic properties. In English, monosyllabic function words are typically
treated as phonological clitics and realized in their weak forms. A clitic is a linguistic item
which “exhibits behavior intermediate between that of a word and that of an affix.
Typically, a clitic has the phonological form of a separate word, but cannot be stressed and
is obliged to occupy a particular position in the sentence in which it is phonologically
bound to an adjoining word, its host... Any process by which an independent word is
reduced to a clitic is called cliticization” (Trask 1996, pp. 74-75).
It is a major
characteristic of English prepositions and articles that they may exhibit an extremely close
phonological connection with the word that follows (Selkirk 1984).
The phonetic consequence of cliticization of the English function word is that they are
unstressed and durationally reduced, as unstressed syllables of content words. Thus, in
the following example, the preposition in in the first sentence is cliticized and durationally
reduced, having the next content word act as its host (stressed syllables are capitalized),
and the cliticized and reduced preposition in tends to be spoken in temporal proximity to the
host act. Consequently, the two sentences are phonetically identical (Selkirk 1984, to
appear)15 :
We will SEE them in ACT 3.
15
The phonological treatment of the function word and the content word in prosodic organization has been
controversial. For example, Selkirk’s (1984) position is substantially different from the one of Nespor &
Vogel (1987) and Hayes (1989). For discussion, see the review paper by Shattuck-Hufnagel & Turk
(1996). In the present study, for consistency, I have adopted Selkirk’s approach, in which both a sequence
of a function word and a content word (e.g., in ACT) and a multisyllabic content word (e.g., enACT) form
the same type of prosodic unit (i.e., the prosodic word).
109
We will SEE them enACT THREE.
The cliticization and phonetic reduction of function words in English contribute to the
formation of larger prosodic units.
On the other hand, Japanese function morphemes are not independent words but bound
morphemes. They are typically suffixed to content words. For example, in the following
sentence, three function morphemes, -ga, -o, -ta, are bound to content words and cannot
stand by themselves:
Taroo-ga
mikan-o
name-NOM tangerine-ACC
‘Taroo ate tangerines’
tabe-ta
eat-PAST
Like English function words, Japanese function morphemes (case particles in this case) are
not accented. However, unlike English function words, Japanese function morphemes are
not subject to shortening.
The results of Experiments 1 and 3 show that the durational reduction of the unstressed
syllables of content words in English is challenging to learn for native speakers of
Japanese. How will Japanese learners of English treat English monosyllabic function
words in the rhythmic organization of their L2 English? Can they learn to cliticize and
durationally reduce English function words?
These questions will be answered in
Experiment 6.
4.2.2. The lapse constraint and the culminativity requirement
In English, there is a tendency to avoid many consecutive unstressed/unaccented syllables.
This phenomenon has been referred to with the term “lapse constraint” (Selkirk 1983, p.
49). The lapse constraint affects the distribution of stress in English stress timing, where
stresses obey a rhythmic tendency towards even spacing, so that they are neither too close
nor too far apart (e.g., see Liberman 1975; Selkrik 1984 for a review). At the phrase level,
110
“a stressless syllable may serve as stressed when it is not adjacent to a stressed syllable
(‘Promotion’)...” (Hayes 1984, p. 917).
Moreover, English words are subject to the culminativity requirement. Culminativity
requires that each content word has a single strongest syllable, bearing the main stress
(Liberman & Prince 1977, p. 262; cited in Hayes 1995).
Both the lapse constraint and the (word level) culminativity requirement are language
specific characteristics.
Indeed, in Japanese, neither the lapse constraint nor the
culminativity requirement are enforced. Consequently, Japanese can have entire phrases
without any pitch accent (Beckman 1992; Kubozono & Nakau 1998). Thus, the following
sentence, which consists of a long sequence of unaccented syllables, is possible in
Japanese:
watashi-wa mukashi
kagoshima-no inaka-kara
jyookyoo-shita
I-TOP
in the past (place)-of
countryside-from came to Tokyo
‘I came to Tokyo from the countryside of Kagoshima in the past’
(adapted from Kubozono & Nakanishi 1998, p. 22)
Do the lapse and culminativity constraints on English stress distribution transfer
negatively to the L2 Japanese produced by native speakers of English? If native speakers
of English have to produce long trains of unaccented syllables in Japanese, will they assign
stress accents, due to the negative transfer of the lapse constraint? These questions will be
answered in Experiment 7 by examining how Japanese phrases consisting of only
unaccented syllables are treated by native speakers of English.
4.3. Experiment 6 (English)
4.3.1. Expected patterns in L2 English
In Section 4.2.1, we reviewed how the distribution of part of speech affects the temporal
organization of English and Japanese.
In English, monosyllabic function words are
111
cliticized, and they are durationally reduced as much as the unstressed syllables of content
words are. Thus, we expect the following L1 English pattern:
Expected L1 English pattern:
In L1 English production, the duration of monosyllabic function words is not
significantly different from that of the unstressed syllables of content words.
To be able to group multiple words into a larger prosodic unit is one of the challenges
in L2 prosodic development. How an utterance is grouped into prosodic units is generally
called phrasing. We can easily notice that less proficient learners have difficulties with
phrasing. Previous studies showed that less proficient learners tend to put fewer words
into one phrase (i.e. less proficient speech is more “choppy”). Ueyama and Jun (1998)
investigated the phrasing patterns of Japanese speakers and Korean speakers of L2 English
by analyzing tone types, and they found that less advanced speakers of L2 English produce
smaller L2 prosodic units regardless of their L1 types: i.e., it is more challenging to
produce longer prosodic units in L2 speech production. A similar correlation was found by
Jun and Oh (2000), a study that investigated intonation patterns in L2 Korean produced by
native English speakers. How is this difficulty in L2 prosodic development reflected in the
temporal/durational aspects of L2 speech?
In L1 English, as mentioned above, the cliticization of function words, which is
phonetically cued by duration reduction and consequent temporal proximity to the content
word that follows it, contributes to the formation of larger prosodic units. Given that L2
English speakers have difficulties in producing large prosodic units, they are likely to fail to
merge clitic function words into larger prosodic units with their hosts. If they do fail to
treat function words as clitics, they will probably not reduce the vowels of such function
words. Thus, we assume that the aforementioned general difficulty in forming larger L2
prosodic units must be reflected by difficulty in the cliticization and durational reduction of
112
English function words. If this is correct, we expect that L2 English speakers have a
harder time to learn the durational reduction of English function words than the durational
reduction of an unstressed syllable within a content word. In other words, we expect the
following order in the acquisition of durational reduction in L2 English produced by
Japanese speakers: Japanese speakers of L2 English first learn to shorten unstressed
syllables of content words and establish a durational contrast between stressed and
unstressed syllables at the word level; they later learn to cliticize and durationally reduce
function words, so that they can produce a larger prosodic unit than a word. Thus, we
expect the following duration pattern in the production of L2 English by native Japanese
speakers:
Expected L2 English pattern:
In L2 English produced by less experienced Japanese speakers, monosyllabic
function words are longer than the unstressed syllables of content words16 .
4.3.2.
Method
Subjects
Subjects for Experiment 6 were selected from the pool of candidates, on the basis of
fluency of reiterant speech, since some candidates showed significant difficulties even after
a certain amount of practice. The final set of subjects included one control group and one
experimental group. The control group consisted of three native speakers of American
English: NE1, NE2 and NE3. NE1 was a male speaker, while NE2 and NE3 were female
speakers. The experimental group consisted of 5 native speakers of Japanese who had
been staying in the United States for more than 5 years (AE1, AE2, AE3, AE4 and AE5).
16
This hypothesis pertain to only those Japanese speakers of L2 English who already learned to shorten
unstressed vowels. Beginning speakers do not differentiate stressed and unstressed vowels by duration, as
shown by the results of Experiment 1, so they will probably not differentiate monosyllabic function words
from the unstressed syllables of content words, since they do not produce vowel reduction in either context.
113
According to the criterion used to determine proficiency levels in the present study, all of
them can be classified as advanced learners: i.e., they have had stayed in an English
speaking country for more than 3 years. We assume that they have already learned some
vowel reduction, based on our finding in Experiment 1. All Japanese participants spoke
the Tokyo dialect as their native tongue. AE1 and AE3 participated in other English
experiments of the present study (Experiments 1, 3 and 4), while the other three speakers
did not. AE1 in Experiment 6 is the same speaker classified as AE4 in Experiments 1 and 3
and AE3 in Experiment 4, while AE3 in Experiment 6 is the same speaker classified as AE2
in Experiments 1 and 3. Background information for all native Japanese speakers is
presented in Table 4-2.
Table 4-2: Background information of L2 English speakers in Experiment 6
age
gender
years of
residence
in the US
age of
age of beginning
arrival in of English
the US
instruction
duration of
English instruction
AE1
29
female
11 years
18
13
10 years 3 months
AE2
32
female
9 years
22
13
13 years
AE3
28
female
7 years
21
13
13 years
AE4
23
female
5 years
18
13
12 years
AE5
25
female
5 years
20
13
8 years
Speech Materials
The corpus for Experiment 6 contained three pairs of test sentences. Sentences in each pair
were identical in terms of Inter-Stress Interval (ISI), and they were different in terms of the
context of the tested unstressed syllables (within a polysyllabic content word vs. in a
monosyllabic function word). In order to see how the number of interstress syllables
affects duration, the ISI size varied from 1 to 3. The three pairs of test sentences are listed
114
in Table 4-3 (the acute mark indicates stress). The syllables which will be analyzed for
duration patterns are underlined.
Table 4-3. Test sentences in Experiment 6
unstressed syllable
in a content word
monosyllabic
function word
ISI=1
Gó togéther
Sée the góvernor
ISI=2
Sée the philósopher
Bórrow a pénny
ISI=3
Gó to the philósopher
Húrry to the ládder
We controlled for the position of the tested syllables with respect to the beginning of the
sentence as well as for syllable weight (i.e., all target syllables have a CV structure). In
order to avoid epenthesis in L2 English production by native speakers of Japanese, we
avoided using words with consonant clusters.
Recording
The six target sentences were mixed with foil sentences. Sentences in each reading of the
list were pseudo-randomized in different orders. To avoid segmental effects on duration
while preserving prosodic patterns, the participants were asked to replace each syllable of
the text with the syllable /no/ (reiterant speech). For example, “Go together” had to be read
as /NO noNOno/ (“NO” and “no” indicate stressed and unstressed syllables, respectively).
The participants were given sufficient time to practice reiterant speech before the recording.
We made sure that all our subjects were able to perform the task17 . The speakers had to
read each sentence ten times. The first reading was not analyzed. Data were recorded in
the recording booth of the UCLA phonetics lab.
17
The Japanese participants were also asked to produce a set of Japanese sentences using reiterant speech.
None of them showed difficulties.
115
Speech data were collected in two different periods, using different methods to present
the reading materials to speakers: during the first data collection period, speakers read
sentences from sheets, whereas during the second period sentences were presented on a
computer screen. We believe that duration patterns were not affected by this difference.
NE3’s performance was recorded in two different sessions using the two different
presentation methods. In the first session, she read a list of pseudo-randomized sentences
from sheets. After one year, she participated in the second session and read sentences
presented on the computer screen. There was no significant difference in the duration
patterns produced by this speaker in the two sessions.
Two L1 English speakers (NE1, NE2) and four Japanese speakers of L2 English
(AE2–AE5) read the list of sentences from sheets. For the other two speakers (NE3 and
AE1), PsyScope was used to present sentences.
Sentences were displayed on the
computer screen one at a time.
Measurement
The recorded data were converted from analog to digital at a 10 kHz sampling rate. We
measured the duration of the unstressed vowel /o/ of the syllable no for each of the six
tested conditions, using Kay Elemetrics’ Computerized Speech Laboratory (CSL). All the
measurements were based on waveform analysis and wide-band spectrograms. The energy
distribution was additionally inspected in the cases of difficult segmentation.
Since
segmental duration is affected by the prosodic structure of utterances, we checked that each
speaker produced similar intonation patterns across tokens by inspecting the sequence of
pitch accents, phrase tones and boundary tones.
Statistic Analysis
Obtained values of duration were analyzed, using two-factor ANOVA and Scheffe’s posthoc tests. The independent variables in the two factor ANOVAs were the effect of the
116
morphosyntactic context of unstressed syllables and the effect of the size of the ISI. The
focus of Experiment 6 is on the effect of the morphosyntactic context of unstressed
syllables on duration (content vs. function word conditions). The effect of the size of the
ISI was included in the ANOVAs in order to control for the variance generated by this
factor.
4.3.3. Results of Experiment 6
L1 English patterns
In Figure 4-1, the mean duration and standard deviation of the unstressed vowel /o/ in the
two morphosyntactic contexts are plotted for each speaker of L1 English. ANOVA results
are presented in Table 4-4 (shaded cells represent significant effects).
Figure 4-1: Mean duration & standard deviation of unstressed /o/
for L1 English speakers in Experiment 6 (α = 0.01)
160
120
/o/ duration (ms)
NE1
NE2
120
90
80
60
40
30
0
0
ISI=1
ISI=2
160
ISI=3
ISI=1
NE3
ISI=2
monosyllabic
function word
120
unstressed syllable
in content word
80
40
0
ISI=1
ISI=2
ISI=3
117
ISI=3
Table 4-4: ANOVA results for L1 English speakers (α = 0.01)
context
ISI
context*ISI
NE1
F(1, 54) = .668
p = .4172
F(2, 54) = 8.67
p = .0005
F(2, 54) = .291
p = .7483
NE2
F(1, 54) = 0.003
p = .9559
F(2, 54) = 9.537
p = .003
F(2, 54) = 2.888
p = .0644
NE3
F(1, 40) = .998
p = .3776
F(2, 40) = .600
p = .4430
F(2, 40) = .444
p = .6446
Figure 4-1 shows no systematic relation between the duration of the unstressed vowel /o/
and the morphosyntactic context of the vowel for all speakers of L1 English. The absence
of a systematic relation between /o/ duration and the morphosyntactic context is confirmed
by the results of ANOVAs in Table 4-4: none of the L1 English speakers show any effects
of the morphosyntactic context of unstressed syllables.
These results show that a
monosyllabic function word and an unstressed syllable in a content word are treated in the
same way in the temporal organization of L1 English, as expected.
L2 English patterns
Mean durations of unstressed /o/ as produced by advanced speakers of L2 English are
presented in Figure 4-2. In this figure, we can see that AE2–AE5 showed noticeably
greater means for the monosyllabic function word context. On the other hand, AE1
showed a much smaller difference between the two morphosyntactic contexts. These two
patterns among the five Japanese speakers of L2 English are also illustrated by the ANOVA
results presented in Table 4-5: in AE1’s data, the effect of morphosyntactic context on
duration is not statistically significant, while it is significant in the data of the other four
advanced speakers of L2 English (AE2-AE5).
118
Figure 4-2: Mean duration & standard deviation of unstressed /o/
for advanced speakers of L2 English in Experiment 6 (α = 0.01)
120
AE1
120
90
90
60
60
30
30
0
*
0
ISI=1
ISI=2
250
/o/ duration (ms)
AE2
*
ISI=3
AE3
200
*
150
ISI=2
*
*
ISI=1
ISI=2
ISI=3
120
90
*
ISI=1
AE4
*
60
100
30
50
0
0
ISI=1
ISI=2
*
90
ISI=3
AE5
*
120
ISI=3
*
60
monosyllabic function word
30
unstressed syllable in content word
0
ISI=1
ISI=2
ISI=3
We additionally performed Scheffe’s post-hoc test for each speaker to find for which ISI
size there was a significant context effect (i.e., function word vs. unstressed syllable of a
content word).
Results are presented in Figure 4-2 (word pairs showing significant
differences are marked by an asterisk). In the production of AE4 and AE5, on average,
119
vowel duration was significantly longer in monosyllabic function words than in unstressed
syllables of polysyllabic content words for all ISIs. AE2 showed the same pattern for
ISI=2 and ISI=3, and so did AE3 for ISI=1 and ISI=2. Only AE1 showed the L1-Englishlike pattern: there was no significant difference between the two contexts for all ISIs.
Table 4-5: ANOVA results for advanced speakers of L2 English (α = 0.01)
context
ISI
context*ISI
AE1
F(1, 48) = .138
p = .7118
F(2, 48) = 4.844
p = .0117
F(2, 48) = 1.044
p = .3601
AE
2
F(1, 54) = 24.699
p = <.0001
F(2, 54) = 1.749
p = .1836
F(2, 54) = 2.375
p = .1026
AE
3
F(1, 54) = 18.403
p = <.0001
F(2, 54) = 6.524
p = .0029
F(2, 54) = 3.75
p = .0299
AE
4
F(1, 54) = 45.486
p = <.0001
F(2, 54) = 2.141
p = .1274
F(2, 54) = .447
p = .6231
AE
5
F(1, 54) = 81.946
p = <.0001
F(2, 54) = 4.988
p = .01
F(2, 54) = 3.488
p = .0376
4.3.4. Discussion of Experiment 6
Tendency toward less reduction for function words in L2 English
The following findings emerge from the results of Experiment 6:
•
All L1 English speakers durationally neutralized monosyllabic function
words and unstressed syllables of content words.
•
On the other hand, in L2 English production by native Japanese speakers,
monosyllabic function words are less reduced than unstressed syllables of
content words (except for AE1, who showed a native-like pattern).
120
The observed tendency toward less reduction of a monosyllabic function word in L2
English–L1 Japanese suggests that Japanese learners of English tend to parse English
monosyllabic function words not as phonological clitics, but as independent words or
prosodic units carrying stress. This difficulty in the cliticization of a function word can be
explained by either the negative transfer of L1 characteristics or a universal constraint on
prosodic development. If the difficulty is due to L1 negative transfer, which feature(s) of
L1 Japanese is relevant? One potential candidate is a L1 Japanese morphosyntactic
characteristic, i.e., Japanese grammatical morphemes are suffixes, not words. Therefore,
it may be cognitively challenging for native speakers of Japanese to cliticize English
function words phonologically and reduce them durationally as much as they reduce
unstressed syllables of content words.
Japanese speakers’ difficulty in the cliticization of English function words could also be
due to a general learning constraint on speech development. In her study of coarticulation
in adults and children, Nittrouer (1994) investigated the production of the schwa syllable
which preceded the target monosyllables (CV) in the carrier phrase, “it’s a (CV), Bob”.
Her results showed that the children’s production of the schwa syllable consistently longer
than that of adults, although the production of schwa is simple in terms of gestural
organization. Nittrouer explained this pattern in the following way:
To a great extent, the production of schwa in this case could be modeled
with simple jaw lowering, followed by jaw raising. According to all other
analyses, these children seemed capable of executing jaw movements with
adult-like timing. Why then were their schwas so long? One possible
explanation may be that these children were treating this function word more
like a stressed syllable than the adults were. (Nittrouer 1994, p. 970)
121
Nittrouer’s finding echoes the aforementioned difficulty in the cliticization of English
function words emerging from the results of Experiment 6. In this experiment, native
speakers of Japanese replaced short English sentences with the simple open syllable /no/.
This syllable should not be difficult in terms of gestural organization, especially since only
fluent speakers of reiterant speech were selected for the recording. The parallel difficulty of
cliticizing function words between English-speaking children and Japanese learners of
English suggests that it is cognitively challenging to group different words phonologically
into a single prosodic unit.
4.4. Experiment 7 (Japanese)
4.4.1. Expected patterns in L2 Japanese
As mentioned in Section 4.2.2, Japanese can have a long train of unaccented moras. Also,
as mentioned in Section 1.5.2, Japanese displays the effect of phrase-final lengthening at
the end of the IP (Intonational Phrase). Considering these two factors, we expect the
following pattern in the production of a Japanese sentence by L1 Japanese speakers when
the sentence is phrased as one IP or AP (Accentual Phrase) with no phrase-medial phrase
break.
Expected L1 Japanese pattern:
In L1 Japanese production, there is no significant durational difference among
non-sentence-final moras.
Since in English stresses tend to obey a rhythmic tendency towards even spacing and
sequences of many unstressed syllables are avoided, native English speakers are likely to
negatively transfer this characteristic of L1 English to their production of L2 Japanese. If
this is correct, we expect the following pattern in L2 Japanese–L1 English:
122
Expected L2 Japanese pattern:
Native speakers of English will tend to introduce accents in Japanese sentences
consisting of unaccented moras.
Accented moras in their L2 Japanese
production are lengthened and produced with longer duration than unaccented
moras.
4.4.2.
Method
Subjects
The set of speakers for Experiment 7 included one control group and two experimental
groups. The control group of this experiment consists of 4 native speakers of Japanese
(Tokyo dialect) who were college students in Tokyo, Japan (NJ1, NJ2, NJ3 and NJ4).
They were female speakers except for NJ1. The experimental groups are made up of three
native speakers of American English learning Japanese (L2 Japanese speakers, henceforth):
two advanced speakers of L2 Japanese (AJ1, AJ2) and one beginning speaker of L2
Japanese (BJ1). The three speakers of L2 Japanese also participated in Experiments 2 and
5. Refer to Section 2.4.1 for information on the learning backgrounds of L2 speakers.
AJ1, AJ2 and BJ4 in Experiments 2 and 5 correspond to AJ1, AJ2 and BJ in Experiment
7, respectively.
Speech Materials
The corpus of Experiment 7 contained Japanese nouns without lexical pitch accent varying
in length from 3 to 4 moras.
The words were embedded in the following pro-drop
structure ending with the inflectional morpheme -da, which corresponds to the copula in
English:
[
X
] - da
123
‘(It) is X’
4-mora sentence
(3 mora noun + -da)
okane-da
5-mora sentence
(4 mora noun + -da)
tomodatSi-da
money-COP
‘It's money’
friend-COP
‘it’s a friend’
These two target words were selected from the list of words taught in the UCLA Japanese
language program.
A potential confounding factor is the variation of intonational patterns across speakers
or across different repetitions of the same token for the same speaker. I use the pro-drop
frame sentence in order to elicit the production of each sentence as one Accentual Phrase
(AP). This will minimize the risk of intonation variation in the experiment. It has been
pointed out that crosslinguistically sentences tend to break up into multiple tonal groups at
syntactic boundaries, as the size of phrases increases. One boundary that is likely to cause
a tonal break-up is that between a subject and a predicate. There is no boundary of this sort
in the frame sentence of Experiment 7, which only retains the complement and the
inflectional morpheme -da. Thus, the use of this structure is expected to minimize the risk
of tonal breakups.
Recording
The two target sentences were mixed with foil sentences. Sentences in each reading of the
list were pseudo-randomized in different orders. To avoid segmental effects on duration
while preserving prosodic patterns, the participants were asked to replace each mora of the
text with the mora /no/ (reiterant speech), as in Experiment 6 (English). For example,
[tomodatSi-da] had to be read as /nonononono/.
124
In the recording session, PsyScope was used to present sentences. One sentence at a
time was displayed on the computer screen at a time. The participants were given sufficient
time to practice reiterant speech ahead of recording. We made sure that all our subjects
were able to perform the task. None of them exhibited difficulties. The speakers had to
read each sentence ten times. The first reading was not analyzed.
Data were recorded in the recording booth of the UCLA phonetics lab for L2 Japanese,
and in the recording room of Meiji Gakuin University Information Center in Tokyo for L1
Japanese.
Measurement
The recorded data were converted from analog to digital at a 10 kHz sampling rate. The
data were analyzed for duration patterns, following the method used in Experiment 6.
Before measurements, we made sure that a sentence was produced as one prosodic phrase
(i.e., Accentual Phrase) with neither hesitation or break in the middle of a sentence.
4.4.3. Results of Experiment 7 (Japanese)
L1 Japanese patterns
The mean duration and standard deviation of the vowel /o/ in each mora position of the 4mora (okane-da) and 5-mora sentences (tomodatSi-da) are plotted for L1 Japanese speakers
in Figures 4-3a, b, respectively. The following effects were consistently shown in both 4mora and 5-mora sentences by all four speakers of L1 Japanese:
•
The first mora is the shortest among all moras of the sentence.
•
The last mora is the longest among all moras of the sentence.
Special properties of the initial and final positions of a prosodic phrase have been found in
a number of studies, and they are known as edge effects (see Fougeron 1999 and Cambier-
125
Langeveld 2000 for reviews of phrase-initial and -final effects, respectively ). I consider
the two systematic patterns presented above as effects of these in Japanese. The longest
duration of the last mora in a sentence/Intonational Phrase is due to sentence-final
lengthening, which was also observed in earlier studies of Japanese (e.g., Ueyama 1999
for experimental evidence). The shortest duration of the first vowel can be interpreted as
the result of a sentence-initial shortening effect phenomenon. As far as I know, this has
never been observed in earlier studies of Japanese.
Figure 4-3a: Mean duration & standard deviation of vowel /o/
of 4-mora unaccented sentence (okane-da) for L1 Japanese speakers
200
160
NJ1
NJ2
150
/o/ duration (ms)
120
a
a
100
80
50
40
0
1
2
3
4
200
0
a
a
1
2
3
4
200
NJ3
NJ4
aa
150
ab
150
a
a
100
100
50
50
0
0
1
2
3
4
1
2
3
4
mora position
(Letters on the top of bar columns indicate the grouping of means on the
basis of ANOVAs and post-hoc tests.)
126
Figure 4-3b: Mean duration & standard deviation of vowel /o/
of 5-mora unaccented sentence (tomodatSi-da) for L1 Japanese speakers
150
NJ1
160
a
100
a
a
NJ2
a
120
c
b
/o/ duration (ms)
80
50
40
0
0
1
2
3
4
5
240
1
2
3
4
5
200
NJ3
NJ4
180
150
a
b
a
a
a
120
100
60
50
0
a
0
1
2
3
4
5
1
2
3
4
5
mora position
(Letters on the top of bar columns indicate the grouping of means on the
basis of ANOVAs and post-hoc tests.)
There are two and three moras in the phrase-medial position of the 4-mora and 5-mora
sentences, respectively, and these medial moras are not subject to edge effects. For each
speaker, the statistical significance of a mean difference was tested across different medial
mora positions in each sentence type, conducting a series of one-factor ANOVAs (tested
mora positions were the 2nd vs. 3rd for the 4-mora sentence, and the 2nd vs. 3rd vs. 4th
for the 5-mora sentence). The grouping patterns of phrase-medial moras are indicated by
letters above the bars of the graphs in Figures 4-3a, b (i.e., a, b, c). ANOVA results are
shown in Table 4-6.
127
Table 4-6: ANOVA results for phrase-medial moras
in 4-mora and 5-mora sentences produced by L1 Japanese speakers (α = 0.01)
NJ1
NJ2
NJ3
NJ4
4-mora
sentence
F(1, 16) = 1.75
p = .2040
F(1, 14) = 3.21
p = .095
F(1, 16) = 9.61
p = .007
F(1, 16) = 2.2
p = .1571
5-mora
sentence
F(2, 20) = 1.72
p = .2059
F(2, 24) = 21.09
p = <.001
F(2, 24) = 10.64
p = .0005
F(2, 24) = 1.36
p = .2757
For the 4-mora sentence, mora position (2nd vs. 3rd moras) did not significantly affect the
duration of the unaccented /o/ for the data of all L1 Japanese speakers except NJ3. For the
5-mora sentence, there were significant effects of mora position (2nd vs. 3rd vs. 4th
moras) for NJ2 and NJ3, but not for NJ1 and NJ4.
For the data of NJ2 and NJ3,
Scheffe’s post-hoc tests were additionally conducted in order to find how the three mora
positions are distinguished by the duration of the vowel /o/. The grouping of the mora
positions are summarized in Table 4-7, on the basis of results of ANOVAs and Scheffe’s
posthoc tests.
Table 4-7: Grouping of mora positions by L1 Japanese speakers
NJ1
NJ2
NJ3
NJ4
4-mora
sentence
( 2, 3 )
( 2, 3 )
2>3
( 2, 3 )
5-mora
sentence
( 2, 3, 4 )
4 > 2> 3
( 2, 4 ) > 3
( 2, 3, 4 )
We find two major patterns in the L1 Japanese data. First, there is no difference across
different mora positions: NJ1, NJ2 and NJ4 for the 4-mora sentence, and NJ1 and NJ4 for
the 5-mora sentence. Second, duration is longer in the 2nd than 3rd position of the 4-mora
sentence (NJ3), or in the 2nd and 4th positions than in the 3rd positions of the 5-mora
128
sentence (NJ2 and NJ3). Note that this relation is observed in the individual plots of all
four speakers in Figure 4-3b. All these patterns suggest that the 3rd mora tends to be
shorter than the 2nd and 4th moras.
L2 Japanese accent patterns
In L1 Japanese, no pitch accent was produced in either the 4-mora or in the 5-mora
sentence types. Accent patterns were transcribed for all L2 Japanese speakers, in order to
see whether they could produce a native-like sequence of unaccented moras. Our way of
transcribing accent patterns was mostly based on the traditional method of kokukogaku
(traditional Japanese linguistics), in which it is assumed that each mora is associated with
either H or L tones. In addition, a pitch accent, i.e., an H tone immediately followed by a
sharp pitch fall, was indicated by H*L.
Accent transcriptions of L2 Japanese are
presented, along with the ones for L1 Japanese, in Table 4-8. The total number of tokens
for each sentence type is 9 (except for the 5-mora sentence for BJ).
Table 4-8: Accent patterns in L1 and L2 Japanese
L1 Japanese
AJ1
AJ2
BJ
4-mora
sentence
9 (LHHH)
9 (LHHH)
8 (LHHH)
1 (LLHH)
9 (LH*LL)
5-mora
sentence
9 (LHHHH)
9 (LLLLL)
9 (H*LLLL)
3 (LLH*LL)
5 (H*LH*LL)
no phrasal H-
Speaker AJ1 produced a sequence of unaccented moras reliably for both sentence types. In
contrast, BJ did not show any instance of an unaccented phrase: he placed an accent on the
2nd mora in all 9 repetitions of the 4-moras sentence; an accent on the 3rd mora in all 9
cases and an additional accent on the 1st mora in 5 out of 9 for the 5-mora sentence.
Finally, AJ2 produced a native unaccented phrase in 8 out of 9 counts for the 4-mora
129
sentence, but he produced a phrase with a pitch accent on the first mora for the 5-mora
sentence.
L2 Japanese vowel duration patterns
The mean duration and standard deviation of the vowel /o/ are shown for L2 Japanese
speakers and a representative speaker of L1 Japanese (Speaker NJ4) for each mora position
in the 4-mora sentence (okane-da) in Figure 4-4a. The same information is presented for
the 5-mora sentence (tomodatSi-da) in Figure 4-4b.
Figure 4-4a: Mean duration & standard deviation of vowel /o/
of 4-mora unaccented sentence (okane-da) for one L1 and three L2 Japanese speakers
200
200
AJ1
NJ4
150
/o/ duration (ms)
a
150
a
100
100
50
50
a
a
0
0
1
2
3
1
4
200
2
3
4
200
BJ
AJ2
150
a
150
a
100
b
a
100
50
50
0
0
1
2
3
4
1
2
3
4
mora position
(Letters on the top of bar columns indicate the grouping of means on the
basis of ANOVAs and post-hoc tests.)
130
Figure 4-4b: Mean duration & standard deviation of vowel /o/ of 5-mora
unaccented sentence (tomodatSi-da) for one L1 and three L2 Japanese speakers
200
200
NJ4
/o/ duration (ms)
150
AJ1
a
a
150
a
100
100
50
50
0
a
a
a
0
1
2
3
4
5
160
1
2
3
4
5
160
AJ2
BJ
120
120
a
a
b
a
a
a
80
80
40
40
0
0
1
2
3
4
5
1
2
3
4
5
mora position
(Letters on the top of bar columns indicate the grouping of means on the
basis of ANOVAs and post-hoc tests.)
As with the L1 Japanese data, a series of one-factor ANOVAs were conducted to look
for differences between medial moras (not including the initial and final moras of each
sentence type, which are expected to be subject to edge effects).
ANOVA results are
shown in Table 4-9. Scheffe’s post-hoc tests were additionally conducted for the data of
the 5-mora sentence in order to find which of the three medial moras were significantly
different. The grouping patterns of phrase-medial moras are indicated by letters above the
bars of the graphs in Figures 4-4a, b (i.e., a, b, c), and also summarized in Table 4-10, on
the basis of results of ANOVAs and Scheffe’s posthoc tests.
131
Table 4-9: ANOVA results for phrase-medial moras
in 4-mora and 5-mora sentences produced by three L2 Japanese speakers (α = 0.01)
AJ1
AJ2
BJ
4-mora
sentence
F(1, 16) = .048
p = .8298
F(1, 14) = 7.411
p = .0165
F(1, 16) = 23.36
p = .0002
5-mora
sentence
F(2, 18) = 5.669
p = .0123
F(2, 24) = 2.957
p = <.0711
F(2, 24) = 15.5
p = <.0001
Table 4-10: Grouping of mora positions by one L1 and three L2 Japanese speakers
NJ4
AJ1
AJ2
BJ
4-mora
sentence
( 2, 3 )
( 2, 3 )
( 2, 3 )
2>3
5-mora
sentence
( 2, 3, 4 )
( 2, 3, 4 )
( 2, 3, 4 )
( 2, 4 ) < 3
As shown in Figure 4-4a, for the 4-mora sentence (okane-da), AJ1 and AJ2 did not
produce any significant mean difference between the 2nd and 3rd moras, while this
difference was significant for BJ. The significantly longer duration of the second mora in
BJ’s production is probably related to the constant placement of an accent on the second
mora, as mentioned earlier (see Table 4-8).
As for the 5-mora sentence, AJ1 and AJ2 did not show any mean difference between
the three phrase-medial moras for the 5-mora sentence (tomodatSi-da).
Speaker BJ
produced the 3rd mora significantly longer than the 2nd and 4th moras, which were not
distinguished. Note that in his production the 3rd mora of the 5-mora sentence was
consistently accented like the 2nd mora of the 4-mora sentence.
132
4.4.4. Discussion of Experiment 7
L1 Japanese patterns
The results of Experiment 7 display the following major patterns for L1 Japanese:
•
Two types of systematic prosodic effects on vowel duration were observed:
phrase-initial shortening and phrase-final lengthening.
•
As a general tendency, there is no significant durational difference among
medial moras in L1 Japanese production. If there is a difference, the 2nd
and 4th moras are longer than the 3rd mora.
Phrase-final lengthening and the absence of durational difference across medial moras
confirm the expected patterns, which were presented in Section 4.4.1. However, the trend
toward the longer duration of the 2nd and 4th moras was not expected.
There are three possible explanations for the observed tendency toward lengthening of
the 2nd and 4th moras. The first explanation is that the observed pattern may be due to the
effects of bimoraic foot structure and foot-final lengthening: a foot-final mora is longer than
a foot-initial mora (e.g., oka#neda, tomo#datS i#da).
The presence of bimoraic foot
structure has been proposed to explain phenomena related to various properties of Japanese
in various types of word formation processes, such as compounding, blending and
borrowing (e.g., Tateishi 1989; Ito 1990; Mester; Poser 1990; Kubozono 1995).
Alternatively, the longer duration of the 2nd and 4th moras could be caused by the
combination of the effects of a phrasal H- tone on the second mora and the spreading of
sentence-final lengthening effects back to the second last mora (i.e., the penultimate). The
AP is the domain of lexical accent patterns in Japanese as mentioned in Section 1.5.1, and
the H- phrasal tone is linked with the second mora of an AP. Similarly, in Korean the
phrasal H- tone of the AP is associated with either the second mora or the first two moras
133
(depending on syllable types).
Jun (1995) found that the second syllable (constantly
associated with a phrasal H- tone) is the longest in both sentence-initial and -medial APs.
This lengthening effect of a phrasal H- on the 2nd syllable in the Korean AP seems to go in
the same direction as the constantly longer duration of the 2nd mora in the Japanese AP, as
emerging from our L1 Japanese data.
Finally, the data of L1 Japanese also showed that in the 5-mora sentence the 4th mora
tends to be longer than both the 3rd mora and the 2nd mora. This could be due to the
spreading effect of sentence-final lengthening back to the penultimate mora, i.e., the 4th
mora in the 5-mora sentence. If this is correct, the 3rd mora of the 4-mora sentence, which
is also the second-to-last mora, should be affected by the spreading of sentence-final
lengthening, and consequently be lengthened. Indeed, this appears to be the case. This
interpretation can also explain why the 3rd mora of the 4-mora sentence, which is also a
penult, is not as short as the 3rd mora of the 5-mora sentence. Of course, it is possible that
all three effects discussed (binary-foot constraint, phrasal H- lengthening and spreading of
final lengthening) affect the duration patterns of Japanese moras. None of these effects has
been investigated in experimental studies (except for Kondo’s (1999) investigation of the
binary-foot constraint on coarticulation). Further investigation is needed.
Production of consecutive unaccented moras in L2 Japanese–L1 English
In Experiment 7, we investigated the accent and duration patterns of two advanced and one
beginning speakers of L2 Japanese (AJ1, AJ2 and BJ). Accent patterns produced by these
speakers are shown along with those of three L1 Japanese speakers in Table 4-11 (adapted
from Table 4-8).
134
Table 4-11: Accent patterns in L1 and L2 Japanese
L1 Japanese
AJ1
AJ2
BJ
okane-da
LHHH
LHHH
LHHH
LLHH
LH*LL
tomodatSi-da
LHHHH
LLLLL
H*LLLL
LLH*LL
H*LH*LL
no phrasal H-
BJ did not produce any native-like unaccented phrase. His consistent pattern was to place a
pitch accent (H*) on the penult of each target noun18 (i.e., okáne and tomodátSi) and an
additional accent on the first mora of tomodatSi (i.e., tómodátSi).
BJ’s systematic
placement of accents on Japanese unaccented words can be explained by the negative
transfer of the lapse constraint in L1 English, which militates against a sequence of many
unaccented moras/syllables. This speaker’s favoring of a penultimate accent may be due to
the influence of English loanword phonology. It is well known that stress tends to be
placed on the penultimate syllable when foreign words are borrowed into English. This
stress pattern is systematic, at least, for loanwords from Japanese: e.g., karáte, teriyáki,
harakíri and tSiraSizúSi. This pattern agrees with the accent pattern of the two target words
of Experiment 7: okáne and tomodátSi as produced by BJ.
While Speaker BJ systematically showed one type of non-native accent pattern, AJ1
was consistently able to produce a sequence of unaccented syllables for both sentence types
(i.e., not produce a H tone immediately followed by a L tone). This speaker also placed a
phrasal H- on the second mora of okane-da (LHH-H), which is a characteristic of Tokyo
Japanese, but he did not produce H- for tomodatSi-da (LLLL-L instead of LHHH-H). AJ2
showed both native and non-native accent patterns. The mixture of native and non-native
135
patterns in the data of these speakers shows that they have learned to produce a sequence of
unaccented moras, but that their production of a native-like unaccented phrase (i.e., LHHH) is not yet stable.
Differences in accent patterns across beginning and advanced L2 Japanese
Two crucial differences are found between advanced and beginning speakers of L2
Japanese. The first difference is that the beginning speaker (BJ) could not produce a
sequence of unaccented patterns, while advanced speakers (AJ1 and AJ2) could (even
though AJ2’s production was not stable).
The second difference between beginning and advanced speakers of L2 Japanese is in
which accent type is used, stress accent or non-stress accent. In Figures 4-5a and 4-5b, the
waveforms and pitch contours of the reiterant speech of tomodatSi-da are shown for BJ and
AJ2. The accent pattern of BJ’s production is H*LH*LL, while that of AJ2 is H*LLLL.
In BJ’s production, the first and third moras receive a H tone immediately followed by a L
tone (i.e., H*+L in Figure 4-5a), which is indicated by two pitch peaks aligned with these
moras. Notice that these accented moras are longer in duration and greater in intensity than
the other three unaccented moras, as shown in the waveform of Figure 4-5a. As mentioned
in Chapters 1 and 2, English stress accent is characterized by higher pitch, longer duration
and greater intensity, and all these three acoustic features are manifested in the accents of
BJ’s production. Thus, we can conclude that BJ uses English stress accent in his L2
Japanese production: i.e., the negative transfer of English stress accent in L2 Japanese.
18
Here we mean not a target sentence but a target word. Please recall that the 4-moras and 5-moras target
sentences of Experiment 7 (okane-da and tomodatSi-da) were formed by embedding the 3-moras and 4-moras
target nouns okane and tomodatSi in a pro-drop structure ending with the inflectional morpheme -da.
136
Figure 4-5a: Waveform and pitch contour of “tomodatSi-da” in BJ’s production
no
no
no
no
– no
200
BJ
175
H*+L
H*+L
150
125
100
75
Hz
ms
200
400
600
800
1000
Figure 4-5b: Waveform and pitch contour of “tomodatSi-da” in AJ2’s production
no
no
no
no
– no
150
AJ2
125
H*+L
100
75
Hz
ms
150
300
450
600
750
Figure 4-5b shows that AJ2 placed an accent on the first mora of the 5-mora sentence,
as is indicated by the alignment of the pitch peak of the entire phrase with this mora. In
AJ2’s production, the first accented mora is not as long and loud as the second mora,
which is not accented. The acoustic contrast between the first accented and second
unaccented moras is much smaller in AJ2’s than BJ’s production. All this suggests that
while BJ uses English stress accent in his production of L2 Japanese, AJ2 does not.
Learning two properties of Japanese prosody: accent and duration patterns
137
The following duration patterns of phrase-medial moras and accent patterns of the 4-mora
and 5-mora sentences were displayed by the three speakers of L2 Japanese in Experiment
7:
•
BJ, a beginning speaker of L2 Japanese, made the penult of each target
word significantly longer than the other medial moras (i.e., okane-da and
tomodatSi-da). The penult of both target nouns was always accented in his
production.
•
AJ1, who consistently produced a sequence of unaccented syllables,
showed native-like duration patterns, i.e., he did not make any significant
difference in duration across phrase-medial unaccented moras.
•
AJ2 successfully produced a native-like unaccented pattern for okane-da,
but he placed a pitch accent (H*) on the first mora of tomodatSi-da. In both
sentence types, he did not place any accent on phrase-medial moras and did
not make any significant difference in duration across phrase-medial moras.
What emerges from these results is that accent and duration patterns seem to develop
separately in the acquisition of L2 Japanese prosody. This point can be made clear by the
re-examination of the case of tomodatSi-da. Individual plots of the three speakers of L2
Japanese, adapted from Figure 4-5b, are annotated for accent patterns and shown in Figure
4-6. In this figure, the first mora is additionally included for post-hoc analysis, in order to
find how all non-final moras are differentiated by duration. The grouping of non-final
moras is indicated by letters in the figure (i.e., a, b). As for accent patterns, the native-like
pattern does not have any pitch accent (H*). While AJ1 produced a native-like unaccented
pattern, AJ2 and BJ did not. AJ2 always placed a pitch accent on the first mora. BJ
always placed an accent on the third mora and added another accent on the first mora for
138
some tokens of tomodatSi-da. As for duration patterns, the baseline in L1 Japanese
patterns is that there is no significant difference across medial moras regardless of accent19 .
BJ produced the accented third mora with significantly longer duration than the other
moras, which is a non-native-like pattern. AJ2, who did not produce a native-like accent
pattern, showed a native-like duration pattern like AJ1.
Figure 4-6: Mean duration and standard deviation of the vowel /o/ of “tomodatSi-da”
for a L1 Japanese speaker (NJ4) and three L2 Japanese speakers (based on Figure 4-5b)
pitch-accented (H*)
160
160
AJ2
/o/ duration (ms)
a
a
a
80
40
40
0
1
100
2
3
b
a
a
a
80
150
a
120
120
200
BJ
4
0
5
1
2
3
4
5
200
AJ1
NJ4
(no pitch accent)
a
a
a
a
150
(no pitch accent)
a
a
a
100
50
50
0
1
2
3
4
0
5
1
2
3
4
5
mora position
(Letters on the top of bar columns indicate the grouping of means on the
basis of ANOVAs and post-hoc tests.)
19
The data of L1 Japanese in Experiment 7 showed a systematic tendency toward the longer duration of the
2nd and 4th mora. However, note that this is not conditioned by the distribution of pitch accents since all
test sentences of this experiment consisted of unaccented moras.
139
The distribution of native-like and non-native-like patterns of accent and duration in
production by the three speakers of L2 Japanese can be summarized in the following way
(positive and negative symbols indicate native-like and non-native-like patterns,
respectively):
L2 Japanese
accent patterns
L2 Japanese
duration patterns
AJ1
+
+
AJ2
–
+
BJ
–
–
These learning patterns suggest that native speakers of English can learn native-like
Japanese duration patterns before accent patterns. However, more data from other L2
Japanese speakers whose native language is English is needed to make this suggestion a
true generalization.
Transfer effects in L2 Japanese
The predicted pattern for L2 Japanese, as discussed in Section 4.4.1, is the following:
Native speakers of English will tend to introduce accents in a Japanese sentence
consisting of unaccented moras.
Accented moras in their L2 Japanese
production are lengthened and produced with longer duration than unaccented
moras.
The results of Experiment 7 show the strong effect of L1 English characteristics on both
accent and duration patterns in the initial stage of learning Japanese by English speakers. A
beginning speaker of L2 Japanese, BJ, constantly placed accent(s) on a sequence of
Japanese unaccented moras, respecting the English lapse constraint and loanword
140
phonology, and he produced those accents by employing the acoustic features of English
stress accent.
These results confirm the aforementioned predicted pattern of L2
Japanese–L1 English.
The comparison of beginning and advanced speakers of L2 Japanese suggests a
possible developmental path of accent and duration patterns: native-like duration patterns
(i.e., no durational difference between phrase-medial moras) are learned before native-like
accent patterns (i.e., the phrase-initial H tone on the second mora and no pitch accent on
unaccented APs). At the moment, this tendency is unexplained.
Effects of instructions of Japanese as a foreign/second language
English speakers’ overall difficulty in learning a sequence of Japanese unaccented moras
could be emphasized by the absence of explicit and systematic instruction of Japanese
accent patterns in Japanese classrooms. What emerged from interviews and also a survey
of the L2 learning background of the English speakers, who participated in the present
study, is that a majority of speakers had neither systematic instruction nor continuous
practice in Japanese accent patterns. Indeed, this is a common situation in teaching
Japanese as a foreign/second language. This low emphasis on the instruction of Japanese
accent patterns may be due to the fact that the distinctive function of Japanese accent
patterns is not as prominent as in tone languages such as Chinese or Thai (e.g., Beckman
1986 for a review). The low distinctive function of Japanese accent is probably one of the
causes of the rich variation in accent patterns across different Japanese dialects or across
different generations within each region.
In spite of remarkable differences in accent
patterns, speakers of different Japanese varieties can understand each other.
Maybe
because of this, teaching accent patterns appears to be a low priority in the curricula of
Japanese as a foreign/second language.
141
4.5. Summary of Experiments 6 and 7
In Experiments 6 and 7, we investigated temporal organization across syllables in L2
English and L2 Japanese, respectively.
In Experiment 6, we examined the
morphosyntactic aspect of temporal organization, focusing on how Japanese speakers of
L2 English learn to cliticize and reduce the duration of English function words. The results
of this experiment showed that Japanese learners of English tend to produce English
monosyllabic function words with longer duration than unstressed syllables of content
words. This indicates that in the temporal organization of their L2 English function words
are treated not as phonological clitics, but as independent words or prosodic units carrying
their own stress. This tendency can be explained by either the negative transfer of L1
characteristics or by a universal learning constraint on producing function words as
phonological clitics.
In Experiment 7, we investigated how English speakers of L2 Japanese treated
Japanese phrases consisting of only unaccented moras, in order to find whether the lapse
constraint in the L1 English timing system affects the production of L2 Japanese or not.
The results of Experiment 7 showed that English speakers of L2 Japanese tend to insert
accents in a train of unaccented moras and also realize those accents using the acoustic
features of English stress, i.e., characteristics of L1 English timing are transferred to L2
Japanese. The comparison of beginning and advanced L2 Japanese suggests that it may be
harder to learn accent patterns than duration patterns in the production of Japanese phrases.
More data are needed to confirm this possibility.
142
Chapter 5:
Awareness of L2 Syllable Structures
In Chapters 2-4 of the present study, I presented a series of phonetic experiments
investigating how the L1 prosodic system affects the production of L2 prosody in various
respects. An additional issue relevant to a better understanding of L2 prosodic
development is how L2 speakers treat the syllable structures of their target language. Are
L2 speakers aware of L2 syllable structure in the same way in which they are aware of L1
syllable structure? Does the phonological awareness of L2 syllable structure change over
the time course of L2 speech development? How is the awareness of L2 syllable structure
related to the phonetic patterns of L2 speech? The main goal of this chapter is to provide a
partial answer to these questions.
5.1. English and Japanese syllable structure
Syllable complexity
Languages differ in terms of the possible types of syllable structure they allow. As
mentioned earlier, English allows a more complex syllable structure than Japanese does, as
illustrated in Table 5-1.
Table 5-1: Syllable structure in English and Japanese
syllable
structure
English
Japanese
(C)(C)(C)V(C)(C)(C)
(C)V V
({ C}) (Kubozono 1989)
An English syllable can have a maximum of three consonants before and after the nuclear
vowel (i.e., in the onset and the coda of the syllable, respectively), while a Japanese
143
syllable allows a maximum of one consonant in each position. Furthermore, Japanese
allows only a nasal stop or the first half of a long consonant (e.g., pan ‘bread’ and katta
‘bought’, respectively) in coda position, while English allows more types of consonants.
Awareness of internal syllable structures
English and Japanese speakers attribute different internal constituency to the syllables of
their respective languages: Japanese speakers posit a subsyllabic unit, the mora, while
English speakers are not aware of the presence of moras. This difference affects the way in
which speakers segment words into minimal prosodic units. Native English speakers use
the syllable as the minimal segmentation unit, while native Japanese speakers use the mora
in word segmentation, as mentioned in Section 1.4.4. For example, the English word ten
is not further segmented by native speakers of English, since this word is monosyllabic.
The same word was borrowed and lexicalized in Japanese. Native Japanese speakers are
also aware of the fact that this word is monosyllabic, but they further break it into two
moras. This difference in the way in which syllable-internal constituents are perceived can
be represented by the graphs in Figure 5-1 (for Japanese, the kind of structure proposed by
Kubozono (1989) is assumed)20 .
20
In phonological theories, the mora is used to represent syllable weight in order to account for the fact that
some phonological phenomena are systematically constrained by how heavy syllables are. For example, in
English, a CV with a lax vowel and a CV with a tense vowel or a diphthong are classified as light and
heavy syllables, respectively. This classification is based on the fact that a CV with a lax vowel show
phonological behaviors different from those of a CV with a tense vowel or a diphthong. For example, in
English, a light CV by itself cannot form an independent word (*/bI/), while a heavy CV can form a word
by itself (/bi/ and /baI/). The internal structures of the two types of syllables are distinguished by different
representations at the moraic level: light syllables are represented as monomoraic and heavy syllables are
represented as bimoraic. In the present chapter, we are dealing with speakers’ awareness of moras, and not
with whether moras are active or not in phonological processes. Thus, for our purposes it makes sense to
assume a representation of English syllables such as the one in Figure 5-1.
144
Figures 5-1: Syllable structures of English and Japanese in native speakers’ awareness
English
Japanese
word
ω
word
ω
syllable
σ
syllable
σ
mora
segment
segment
tEn
'ten'
µ µ
t e n
'point'
In Figure 5-1, the internal representations of the English word /tEn/ and the Japanese word
/ten/ ‘point’ are compared. The English word consists of three segments forming a CVC
syllable. Similarly, the Japanese word consists of three segments forming a syllable, but
the sequence can be treated as both one unit (one syllable) or two (two moras) by native
Japanese speakers: /ten/ –> /ten/ or /te + n/. A vowel can constitute one syllable and one
independent mora (e.g., /i/ ‘stomach’, /e/ ‘painting’ and /o/ ‘tail’), but an onset consonant
cannot stand alone as either a syllable or a mora. On the other hand, a coda nasal forms an
individual mora, and it is called sokuon or moraic nasal and often transcribed as /N/ in
Japanese phonology (see Vance 1987; Shibatani 1990; Tsujimura 1996).
In addition to the coda nasal /N/, the second half of a long vowel and the first half of a
long consonant also behave as individual moras in Japanese. They are called cho-on and
hatsuon and represented by /R/ and /Q/ in Japanese phonology, respectively. Examples of
these two moraic segments inside Japanese syllables are shown in Figure 5-2. The two
Japanese words, /took∆oo/ ‘Tokyo’ and /tatta/, consist of two syllables and are syllabified
as /too-k∆oo/ and /tat-ta/, respectively. Each syllable of ‘Tokyo’ has a long vowel, /oo/, and
the second half of this vowel is counted as an independent mora. Thus, this word contains
four moras, /to-o- k∆ o-o/. The word /tatta/ contains a long consonant /tt/ in the middle of
145
the word. The first half of this long consonant (i.e., /Q/ or hatsuon) is counted as one
mora. Thus, this word is counted as three moras, /ta-t-ta/.
Figures 5-2: More examples of Japanese syllable structure
word
ω
ω
ω
ω
syllable
σ
σ
σ
mora
µ µ
µ µ
µ µ
σ
µ
segment
t o o
k∆ o o
'Tokyo'
t a t
t a
'stood'
In addition to word segmentation, evidence for Japanese speakers’ awareness of the
mora can be found in various linguistic phenomena. For example, the Japanese alphabet is
mora-based, in the sense that each letter corresponds to one mora (i.e., the moraic nasal,
the first half of a long consonant or the second half of a long vowel and all sequences of
onset+short vowel/first half of a long vowel are represented by single letters)21 . The meter
of Japanese poetry (e.g., haiku) is based on mora counting, not on syllable counting.
Word games played by Japanese children are based on mora counting. Note that there is
no corresponding evidence for the existence of the mora in English (e.g., the English
version of haiku is based on syllable counting, not mora counting).
5.2. Research questions
As shown in the review of the previous section, English and Japanese syllable structures
show crucial differences in terms of two linguistic properties: 1) structural complexity (i.e.,
English allows more complex syllables with more consonants than Japanese) and 2) the
21
In recent psycholinguistic studies, it has been shown that the Japanese alphabet system, kana, plays a
substantial role in developing the awareness of the mora in L1 Japanese acquisition. See Inagaki, Hatano
and Otake (2000) for experimental evidence.
146
absence vs. presence of the mora in speakers’ phonological awareness (i.e., mora is
present in Japanese, but not in English). Do language learners become aware of the
characteristics of L2 syllable structure? More specifically, the following questions will be
asked with respect to L2 English–L1 Japanese and L2 Japanese–L1 English:
•
Do Japanese speakers of L2 English learn to segment English words into
syllables without breaking English consonant clusters into moras/syllables?
•
Do English speakers of L2 Japanese learn to segment Japanese words into
moras (vs. syllables)?
In order to answer these questions, we conducted a phonological survey in which we asked
participants for judgments on word segmentation.
5.3. Method
5.3.1. Subjects
Seven speakers of L1 English (NE1–7) and 3 advanced and 4 beginning Japanese speakers
of L2 English (AE1–3 and BE1–4, respectively) participated in the English phonological
survey. All Japanese speakers of L2 English except for BE4 also participated in
Experiment 4, which investigated the production of English tense vs. lax vowels. Five
speakers of L1 Japanese (NJ1–5) and 3 advanced and 4 beginning speakers of L2 Japanese
(AJ1–3 and BJ1–4, respectively) participated in the Japanese phonological survey. All
speakers of L2 Japanese also participated in Experiment 5, which investigated the
production of Japanese long vs. short vowels.
5.3.2. Materials
The speech materials of the phonological survey were designed in order to test all possible
syllable structures for English and Japanese.
147
The English corpus consisted of 44
monosyllabic words; the Japanese corpus consisted of 24 words, varying in the number of
moras from one to three. Target words used in Experiments 4 and 5 (which tested the
production of L2 vowel contrasts) were included. The English and Japanese materials are
listed in Tables 5-2 and 5-3, respectively. In both tables, the syllable structure of each test
word is presented next to the word. In English, both diphthongs and tense vowels are
represented as VV. In Japanese, a long vowel is represented as VR.
Table 5-2: 44 monosyllabic words used in the English phonological survey
I
VV
chit
CVC
kept
CVCC
tribes
CCVVCC
it
VC
dip
CVC
next
CVCC
troops
CCVVCC
eat
VVC
kin
CVC
beast
CVVCC
scree
CCCVV
eyes
VVC
pit
CVC
kind
CVVCC
strew
CCCVV
X
VCC
bid
CVC
liked
CVVCC
stray
CCCVV
iced
VVCC
deep
CVVC
least
CVVCC
split
CCCVC
east
VVCC
cheat
CVVC
true
CCVV
strict
CCCVCC
asked
VCCC
keen
CVVC
spit
CCVC
strength CCCVCCC
say
CVV
bead
CVVC
speak
CCVVC
strive
CCCVVC
cry
CVV
Pete
CVVC
spoil
CCVVC
striped
CCCVVCC
few
CVV
tax
CVCC
stream
CCVVC
streams CCCVVCC
Table 5-3: 24 words used in the Japanese phonological survey
e‘
V
‘painting’
tat’ta
CVQCV
‘stood’
ki‘
CV
‘tree’
i‘i
VR
‘good’
bi‘ru
CVCV
‘building’
i‘i ko
VR (CV)
‘good child’
ki‘ta
CVCV
‘came’
a‘u
VV
‘meet’
ka‘do
CVCV
‘corner’
ki‘i
CVR
‘key’
tS i ‘ z u
CVCV
‘map’
ka‘a
CVR
‘car’
to‘ru
CVCV
‘take’
bi‘iru
CVRCV
‘beer’
kutS i
CVCV
‘mouth’
gi’taa
CVCVR
‘guitar’
e‘n
VN
‘yen’
tS i ‘ i z u
CVRCV
‘cheese’
se‘n
CVN
‘thousand’
ka’ado
CVRCV
‘card’
at‘ta
VQCV
‘existed’
to’oru
CVRCV
‘pass’
itta
VQCV
‘said’
ko’on
CVRN
‘corn’
148
5.3.3. Procedure
Data collection
The selected words were pseudo-randomized. PsyScope was used to present words. One
word was displayed on the computer screen at a time. Before showing the words to be
analyzed, three practice words were presented.
In order to elicit judgments on word segmentation, we asked participants to replace
each syllable/mora with the syllable no. The selection of syllable or mora was open to each
participant. For the Japanese survey, participants were asked to use only the syllable no,
and not noo (with a long vowel), or a glottal stop or the moraic nasal /N/. We required this
in order to avoid ambiguities: if speakers parsed a word such as tooru ‘pass’ with the string
noo-no, we could not tell whether they replaced the bimoraic syllable too with another
bimoraic syllable, or whether they simply preserved the segment length of the first vowel
of this word (tooru). On the other hand, if speakers substitute too with no-no, we can be
sure that they count it as two moras.
Segmentation judgments were recorded in the recording booth of the UCLA phonetics
lab for the L1 English, experienced L2 English, L1 Japanese and L2 Japanese groups. The
Japanese speakers of beginning L2 English were recorded in the recording room of Meiji
Gakuin University Information Center in Tokyo.
Data analysis
For each participant, the collected data were transcribed. For English words, all phonetic
variations of the syllable no (e.g., [noU], [no:] or [no]) were counted as one count of the
same segmentation unit /no/. For Japanese words, only [no] with the short vowel /o/ was
used by all participants, since they were instructed to do so. For every word, the number
of no used was counted.
149
5.4. Results
5.4.1. L2 English segmentation
L1 English vs. L2 English patterns
For each participant, the counts of no produced in correspondence to all 40 English test
words are presented in Table 5-4. Tense vowels and diphthongs are represented by VV,
lax vowels by V.
Table 5-4: Number of /no/ (representing the segmentation of English words
by L1 English speakers and Japanese speakers of L2 English)
word
syllable
L1 English
L2 English
structure N E 1 N E 2 N E 3 N E 4 N E 5 N E 6 N E 7 A E 1 A E 2 A E 3 B E 1 B E 2 B E 3 B E 4
I
it
eat
eyes
X
iced
east
asked
say
cry
few
chit
dip
VV
CVC
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
3
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
1
1
1
1
1
1
1
1
3
1
3
2
4
1
2
1
2
2
2
1
2
1
2
1
3
2
3 --3
3
3
1
4
3
2
1
3
2
1
1
2
1
3
1
2
2
3
3
4
4
4
4
2
3
2
2
2
1
2
2
3
3
3
3
4
2
3
1
2
2
kin
CVC
1
1
1
1
1
1
1
1
2
1
2
2
2
2
2
3
3
3
3
4
2
2
1
2
2
1 ---
pit
CVC
1
1
1
1
1
1
1
1
3
1
2
2
2
2
bid
CVC
1
1
1
1
1
1
1
1
3
1
3
2
2
2
deep
cheat
keen
CVVC
CVVC
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
1
2
2
2
1
1
1
3
3
3
2
2
3
2
2
1
2
2
1
bead
CVVC
1
1
1
1
1
1
1
2
2
1
3
2
2
2
Pete
tax
kept
CVVC
CVCC
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
3
2
3
3
2
1
3
3
3
3
2
2
3
2
3
3
2
3
3
next
CVCC
1
1
1
1
1
1
1
1
4
3
3
3
3
4
beast
kind
CVVCC
1
1
1
1
1
1
1
1
1
1
1
1
2
1
3
2
3
4
3
1
4
4
3
3
3
3
3 ---
VC
VVC
VVC
VCC
VVCC
VVCC
VCCC
CVV
CVV
CVV
CVC
CVVC
CVCC
CVVCC
150
1
2
2
3
2
3
3
3
1
2
1
2
2
(Table 5-4 is continued.)
word
sylllable
L1 English
L2 English
structure N E 1 N E 2 N E 3 N E 4 N E 5 N E 6 N E 7 A E 1 A E 2 A E 3 B E 1 B E 2 B E 3 B E 4
liked
CVVCC
1
1
1
1
1
1
1
2
4
3
4
3
3
3
least
CVVCC
1
1
1
1
1
1
1
2
3
1
4
3
3
3
true
spit
speak
spoil
CCVV
CCVVC
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
3
3
1
3
3
4
1
3
3
3
3
4
4
4
2
3
2
3
2
3
2
3
2
3
3
3
stream
CCVVC
1
1
1
1
1
1
1
2
4
3
5
3
3
3
tribes
CCVVCC
1
1
1
2
1
1
2
4
4
3
5
4
4
4
troops
CCVVCC
1
1
1
1
1
1
1
2
4
3
5
4
4
4
scree
strew
CCCVV
CCCVV
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
1
3
3
2
2
4
3
3
3
3
3
3
3
stray
CCCVV
1
1
1
1
1
1
1
1
3
3
4
3
3
4
split
strict
CCCVC
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
5
5
3
3
4
5
3
3
4
5
4
5
strength CCCVCCC
1
1
1
1
1
1
1
2
5
3
6
4
5
5
strive
striped
streams
1
1
1
1
1
1
1
1
1
1
2
1
1
1
1
1
1
1
1
2
1
3
2
2
4
5
5
3
3
3
5
6
6
3
5
5
4
4
5 --5
4
44
45
44
48
44
44
50
1 1.09
1
CCVC
CCVVC
CCCVCC
CCCVVC
CCCVVCC
CCCVVCC
TOTAL
average/word
1 1.02
8 5 136
8 6 157 121 120 117
1 1.13 1.93 3.09
2 3.65 2.75 2.73 2.85
Since all test words are monosyllabic, we expect that every L1 English speaker will
replace each English word with 1 count of /no/. Shaded cells indicate non-native patterns,
i.e., words produced with more than one /no/. The sum of the number of /no/ instances for
all 40 words was computed for each participant. Results are presented in the second last
row of the table. Furthermore, an average /no/ count per word was computed for each
participant by dividing the total number of /no/ counts by the total number of words (i.e.,
average count/word = total count/44 words).
The results show that for the L1 English group the expected native pattern is dominant.
There are 308 cases in total (7 speakers X 44 test words), and 298 cases follow the
151
expected native pattern (a word is parsed as one count of /no/); the percentage of occurrence
of the expected native pattern is more than 95%. In L2 English data, on the other hand,
native-like patterns occur in a minority of cases.
This difference between L1 and L2
English patterns is also illustrated in Figure 5-3. In this figure, the average of /no/ counts
per word from the last row of Table 5-5 is plotted for all participants. This graph shows
that L1 English speakers generally treat English monosyllabic words as one count of the
segmentation unit no, while Japanese speakers of L2 English tend to break down English
monosyllabic words into multiple counts of /no/.
Figure 5-3: Average number of instances of the segmentation unit /no/
for English monosyllabic words
L1 English
Advanced
L2 English
average /no/ count per word
4
Beginning
L2 English
3.65
3.09
3
2.75 2.73
1.93
2
1
1
1.02
1
1.09
1
1
2.85
2
1.13
0
NE1 NE2 NE3 NE4 NE5 NE6 NE7
AE1 AE2 AE3
BE1 BE2 BE3 BE4
Beginning vs. advanced L2 English
The results of the English survey show the persistence of non-native patterns across all
seven Japanese speakers of L2 English: monosyllabic English words tend to be parsed as
multiple counts of /no/. Is there any systematic difference between advanced and beginning
speakers of L2 English? Do advanced Japanese speakers of L2 English begin to perceive
monosyllabic English words in a native-English-like manner? In order to answer these
152
questions, we computed the percentage of cases showing a native English-like pattern (i.e.,
an English monosyllabic word is parsed as one count of /no/) for the L2 English data of
each speaker (Table 5-4), by dividing the number of native-English-like responses by the
total number of responses. As the reference for the native-English pattern, we conducted
the same calculation for each of the L1 English speakers and computed the group average
by pooling the averages of the single speakers. The results are shown in Figure 5-4.
percentage of native-like judgment (%)
Figure 5-4: Percentage of native-English-like patterns
produced by L1 English speakers and Japanese speakers of L2 English
L1 English
100
Advanced
L2 English
Beginning
L2 English
75
50
25
0
NE
AE1 AE2 AE3
BE1 BE2 BE3 BE4
Two advanced speakers of L2 English, AE1 and AE3, showed larger percentages of
native-like patterns than the other five speakers of L2 English, AE3 and BE1–4, although
these two speakers’ percentages are still much lower than the average percentage for L1
English (96 %).
This suggests that it is possible to learn to perceive an English
monosyllabic word as one count of /no/ to a certain extent, although it is difficult to master
the native-like segmentation. AE2 and BE1–4 showed low percentages, all occurring
within the same range (lower than 5 %), and no beginning L2 speaker showed patterns that
were more native-English-like than those of the three advanced L2 speakers. These results
imply some positive correlation between how experienced or proficient Japanese speakers
153
of L2 English are and how native-like their segmentation judgments become, although there
is no systematic separation between the advanced and beginning L2 English groups.
Syllable or mora?
Which segmentation unit do Japanese speakers of L2 English employ to segment English
monosyllabic words, the syllable or the mora? This question can be partially answered by
analyzing the results for the following four words, whose syllable structures are also
possible in Japanese:
Table 5-5: Results of judgments on the segmentation of English words
whose syllable structures are also possible in L1 Japanese
word
syllable
L1 English
L2 English
structure N E 1 N E 2 N E 3 N E 4 N E 5 N E 6 N E 7 A E 1 A E 2 A E 3 B E 1 B E 2 B E 3 B E 4
I
say
few
VV
CVV
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
1
1
1
1
2
2
2
1
2
1
kin
CVC
1
1
1
1
1
1
1
1
2
1
2
2
CVV
1
1
1
2
2
1
1 ---
In L2 English speakers’ judgments, these words correspond to one count of /no/ in
some cases, as in L1 English, and to two counts of /no/ in other cases. This difference can
be explained in terms of whether the mora is used or not to segment monosyllabic English
words. For example, AE1, AE3 and BE3 segmented say as /no/, while AE2, BE1, BE2
and BE4 segment it as /no-no/. Two counts of /no/ surely evidence the use of mora as the
unit employed to segment say (i.e., mora-based segmentation of say).
A similar
explanation applies to the other three words, I, few and kin.
Among the seven Japanese speakers of L2 English, AE1, AE3 and BE3 treated all four
words as one count of /no/, while BE1 treated all four words as two counts of /no/. AE2,
BE2 and BE4 showed both types. Although two counts of /no/ indicate the use of morabased segmentation for English monosyllabic words, one count of /no/ does not constitute
154
solid evidence of the use of syllable as a segmentation unit (i.e., syllable-based
segmentation), due to a problem with the method used to collect speakers’ judgments in the
English survey. Unlike in the Japanese survey, we did not ask participants to use only /no/
with the short vowel /o/. We believe that this is not a serious problem for the interpretation
of the L1 English data. As shown in Table 5-5, we found no single case of two counts of
/no/ in the judgments of seven L1 English speakers for the analyzed words (and, as shown
in Table 5-4, which presents all 44 tested words, there was hardly any instance of two
counts of /no/ in general). This suggests that one count of /no/ in the L1 English data
means the use of a single unit -- the syllable -- for segmenting monosyllabic words.
However, it is possible that L2 Japanese speakers occasionally used /noo/ instead of /no/ to
represent two moras. Thus, we can only conclude that Japanese speakers of L2 English
used mora-segmentation in some cases, while L1 English speakers used only the syllable,
in order to syllabify English monosyllabic words. More data collected following the same
method used for our Japanese survey is needed in order to determine whether L2 English
speakers ever use the syllable to parse monosyllabic but bimoraic words.
Treatment of consonant clusters in L2 Japanese
The structural complexity of a syllable can be measured by the number of consonants
within the syllable, without distinguishing between onset and coda. A close examination of
the L2 data in Table 5-4 shows that Japanese speakers of L2 English tend to segment a
more complex syllable into more counts of /no/. In order to show this correlation, the
average count of /no/ was calculated for a representative speaker from each speaker group,
depending on the number of consonants of a test word (e.g., 0 for I (VV), 6 for strength
(CCCVCCC)). Results are shown in Figure 5-5.
Three distinctive patterns emerge across the three speakers. NE1 in the L1 English
group consistently categorized test words as monosyllabic regardless of how many
155
consonants there are. In contrast, BE1, in the beginning L2 English group, shows a clear
positive correlation between the average of /no/ counts and the number of consonants in
monosyllabic English words. This indicates that this speaker perceives a train of English
consonants as a train of moras or syllables. Finally, AE1 in the advanced L2 group shows
an intermediate pattern.
Figure 5-5: Average number of occurrences of the segmentation unit /no/
as a function of the number of consonants in a syllable
7
average count of /no/
6
7
NE1
L1 English
6
5
5
4
4
3
3
2
2
1
1
0
AE1
advanced
L2 English
0
0
1
2
3
4
5
6
4
5
6
0
1
2
3
4
5
6
7
6
BE1 beginning
L2 English
5
4
3
2
1
0
0
1
2
3
# of consonants in syllable
5.4.2. L2 Japanese segmentation
L1 Japanese vs. L2 Japanese patterns
For each participant, the number of occurrences of the segmentation unit no for 15 words
which include bimoraic syllables (i.e., the moraic nasal /N/, long vowels or coda
consonants are contained in a syllable) are presented in Table 5-6. As for the English
156
survey, the total number of /no/ counts for each participant is presented in Table 5-6 and
plotted in Figure 5-6. We expected that in native Japanese /no/ would be treated as a mora,
not as a syllable. This is a legitimate assumption, since we asked participants to use only
/no/ with the short vowel /o/. We expected the moraic nasal, the second half of a long
vowel and the first half of a long consonant to be produced as one count of /no/, and this is
indeed the pattern shown in the results of our Japanese survey. In Table 5-6, shaded cells
indicate non-native patterns. The expected native patterns are presented in Table 5-7.
Table 5-6: Number of /no/ (representing the segmentation of Japanese words
by L1 Japanese speakers and English speakers of L2 Japanese)
L1 Japanese
L2 Japanese
NJ1 NJ2 NJ3 NJ4 NJ5 AJ1 AJ1 AJ3 BJ1 BJ2 BJ3 BJ4
with
with
/N/
/R/
with /Q/
en
VN
2
2
2
2
2
2
2
2
1
1
1
1
sen
koon
CVN
CVRN
2
3
2
3
2
3
2
3
2
3
2
3
2
3
2
3
1
3
1
1
1
2
1
1
ii
VR
2
2
2
2
2
2
2
2
2
1
1
1
kaa
CVR
2
2
2
2
2
2
2
2
2
1
2
1
kii
CVR
2
2
2
2
2
2
2
2
2
1
2
1
hai
CVR
2
2
2
2
2
2
2
2
1
1
1
1
gitaa
biiru
CVCVR
CVRCV
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
2
2
3
3
2
2
tooru
CVRCV
3
3
3
3
3
3
3
3
3
2
2
2
kaado
CVRCV
3
3
3
3
3
3
3
3
3
2
3
2
chiizu CVRCV
3
3
3
3
3
3
3
3
3
2
3
2
tatta
CVQCV
3
3
3
3
2
2
3
3
2
2
2
2
atta
itta
VQCV
VQCV
3
3
3
3
3
3
3
3
2
2
2
2
3
3
3
3
2
2
2
2
2
2
2
3
41
41
41
41
38
38
41
41
35
24
32
25
total
157
Figure 5-6: Total number of occurrences of the segmentation unit /no/ for Japanese words
L1 Japanese
total number of /no/
41
41
41
Advanced
L2 Japanese
41
40
41
38
Beginning
L2 Japanese
41
38
35
32
30
25
24
20
10
0
NJ1
NJ2 NJ3
NJ4
NJ5
AE1 AE2 AE3
BE1 BE2 BE3 BE4
Table 5-7: Expected native patterns in the segmentation of Japanese words
word
syllable
mora
word
segment
.
syllable
mora
segment
.
word
syllable
mora
segment
.
en
VN
no-no
kii
CVR
no-no
kaado
CVRCV
sen
CVN
no-no
hai
CVR
no-no
chiizu
CVRCV
koon
CVRN
gitaa
CVCVR
CVQCV
VR
biiru
CVRCV
atta
VQCV
kaa
CVR
no-no
tooru
CVRCV
no-nono
no-nono
no-nono
tatta
ii
no-nono
no-no
itta
VQCV
no-nono
no-nono
no-nono
no-nono
no-nono
We find that the expected native Japanese patterns are dominant in both the L1 Japanese
and advanced L2 Japanese groups: four out of five L1 Japanese speakers and 2 out of three
advanced speakers of L2 Japanese fully parsed the tested Japanese words into moras. For
example, tooru ‘pass’ consists of one bimoraic syllable and one monomoraic syllable (i.e.,
too-ru). This word is parsed as /no-no-no/ (vs. /no-no/) when the mora is employed as a
segmentation unit. In contrast, four beginning speakers of L2 Japanese produced non-
158
native patterns, i.e., syllable-based segmentation in most cases.
For example, sen
‘thousand’ consists of one bimoraic syllable, and it is parsed as /no/, not as /no-no/,
indicating that the syllable is employed as a segmentation unit. Among the four beginning
speakers of L2 Japanese, BJ2 and BJ4 produced non-native patterns (syllable-based
segmentation) in almost all cases, while BJ1 and BJ3 mixed native (mora-based) and nonnative (syllable-based) patterns.
This difference between the three speaker groups (L1 Japanese, advanced L2 Japanese
and beginning L2 Japanese) is reflected by the difference in the total count of occurrences
of the segmentation unit as shown in Table 5-6 and Figure 5-6: the total number of
instances of the segmentation unit is larger in the L1 Japanese and advanced L2 Japanese
groups than in the beginning L2 Japanese group. Mora-based segmentation contributes to a
larger total number of occurrences of the segmentation unit /no/.
Which moraic segment is hard to perceive as an independent mora?
In Table 5-6, test words are categorized on the basis of which of the three moraic segments
is present in the bimoraic syllable: the moraic nasal /N/, the second half of a long vowel /R/
or the first half of a long consonant /Q/. The examination of the table shows that BJ1 and
BJ3 produced more native-like patterns for test words with /R/ than for words with /N/ and
/Q/. This indicates that L2 Japanese speakers begin to treat /R/ (the second half of a long
vowel) as an individual mora before the other two types of moraic segments (/N/ and /Q/).
At this moment, it is difficult to explain why the second half of a Japanese long vowel
/R/ is easier for native English speakers to perceive as an independent mora than the coda
consonants /N/ and /Q/22 .
However, we may find cues to explain the pattern in the
crosslinguistic typology of syllable weight (see Gordon 1999 for a review). As Gordon’s
22
With respect to /Q/, notice that the failure to parse it as a mora could be due to the fact that English
speakers in general have a hard time producing long consonants. Consequently, it is possible that they do
not perceive the first half of a geminate as part of the coda of the preceding syllable.
159
survey shows, in languages with a metrical system sensitive to syllable weight, (C)VV
with a long vowel is more likely to be treated as heavy (bimoraic) than (C)VC with a short
vowel followed by a coda consonant. Thus, it is also possible that it is easier to learn to
treat a (C)VV syllable as bimoraic than to learn to treat a (C)VC syllable as bimoraic.
5.5. Discussion
5.5.1. L1 vs. L2 word segmentation
Are L2 speakers aware of L2 syllable structures in the same way in which they are aware of
L1 syllable structures? I will try to answer this question by comparing L1 vs. L2 speakers’
judgments on word segmentation in English and Japanese.
Word segmentation in L1 Japanese and L1 English
The results of the English and Japanese phonological surveys confirmed the expected
native patterns for each language. L1 English speakers segmented monosyllabic English
words with one count of the segmentation unit /no/ in more than 95% cases, regardless of
how complex their syllable structure was. L1 Japanese speakers employed the mora in
order to segment Japanese test words: syllables including moraic segments (/N/, /R/ or /C/)
were treated as two moras in almost all cases. This confirms that the syllable and the mora
are used as units of word segmentation in L1 English and L1 Japanese, respectively.
The effects of L1 characteristics on L2 segmentation
Two major patterns emerge from the analysis of L2 English data. The first pattern is that
Japanese speakers of L2 English tend to divide single complex English syllables into
multiple counts of /no/ by breaking consonant clusters. This occurs when the structures of
English syllables do not satisfy the constraints of Japanese syllable structure, where no
consonant cluster other than /N.Q/ is allowed and only nasals or the first half of geminate
consonants can occur in the coda position. The structural simplicity of L1 Japanese
160
syllables negatively transfers to the syllable structures of L2 English. The second major
pattern in L2 English is the following: some Japanese speakers of L2 English use morabased segmentation and reanalyze the complex structure of English syllables even though
the structures of those English syllables are legitimate in Japanese, as shown in Table 5-5.
This indicates that Japanese speakers of L2 English adapt L1 Japanese mora-based
segmentation in their L2 English segmentation.
As for L2 Japanese segmentation, while advanced speakers of L2 Japanese showed
mora-based segmentation (the consistent pattern in L1 Japanese), beginning speakers of L2
Japanese used syllable-based segmentation in most cases.
This difference is due to a
difference in the treatment of bimoraic syllables containing moraic segments: beginning
speakers of L2 Japanese treated a bimoraic Japanese mora as one count of /no/ (i.e., one
syllable), while native Japanese speakers and advanced speakers of L2 Japanese treated it
as two counts of /no/ (i.e., two moras). This indicates that beginning speakers of L2
Japanese tend not to treat moraic segments as individual moras in L2 Japanese word
segmentation. However, this pattern needs to be interpreted with caution: it is not possible
to conclude that beginning speakers of L2 Japanese are not aware of the presence of the
mora. In the case of L2 English segmentation, the larger number of instances of the
segmentation unit /no/ is a solid piece of evidence for the claim that Japanese speakers
decompose an English syllable structure into multiple moras/syllables, due to the negative
transfer of L1 Japanese characteristics. However, in the case of the Japanese data, the
smaller number of counts of /no/ does not necessarily mean that beginning speakers of L2
Japanese use only the syllable without being aware of the mora. As Japanese has both
moras and syllables, beginning L2 Japanese speakers could replace each syllable with /no/
even if they are aware of the existence of moras. The only thing we can be sure about
concerns the awareness of the mora by advanced speakers of Japanese: their native-like
161
judgments on Japanese word segmentation shows that they have learned to analyze the
internal structures of Japanese syllables with the use of the mora.
5.5.2. Word segmentation in beginning vs. advanced L2 speech
We have observed an asymmetry between the two L2 types in terms of the mastery of
native-like segmentation by advanced L2 speakers. In both L2 English and L2 Japanese,
beginning speakers showed the effect of their L1 segmentation patterns: e.g., the use of the
mora and the decomposition of one English syllable into multiple moras/syllables in L2
English–L1 Japanese; the failure to treat moraic segments as individual moras in L2
Japanese–L1 English (at least at the surface level). However, while advanced speakers of
L2 English show non-native patterns in their segmentation judgments, advanced speakers
of L2 Japanese show the same patterns as L1 Japanese speakers do. This asymmetry may
be due to the fact that learning a new category (the mora for English speakers) is easier than
ignoring/suppressing/deleting sensitivity to a pre-existent category (the mora for Japanese
speakers).
Another possibly relevant factor is the difference between the English and Japanese
writing system. English speakers could be helped in acquiring the Japanese mora category
by the Japanese alphabet, kana. In the kana system, each letter corresponds to one mora.
It is not difficult to imagine that the Japanese alphabet consistently reminds English
speakers of the existence of the mora (a similar effect was observed in L1 Japanese
acquisition by Inagaki et al. (2000)). This type of visual help is not available for Japanese
speakers learning English, since in English the writing system does not show the nativelike syllabification of words.
5.5.3. Connection between awareness of L2 syllable structures and L2
segmental production
162
How is the awareness of L2 syllable structures related to L2 segmental production? In
order to answer this question, we consider one case for each L2 type.
L2 English–L1 Japanese: phonological awareness of English syllable
structure vs. production of the duration contrast between English tense vs.
lax vowels
As mentioned before, beginning Japanese speakers of L2 English have a hard time to
produce English words with consonant clusters or coda consonants (other than /N/ or /Q/).
As a solution, they optimize complex English syllables by simplifying their structure. This
strategy is used in Japanese loanword phonology as well. A classic example is: McDonald
in L1 English is borrowed and pronounced as /ma.ku.do.na.ru.do/ with vowel epenthesis.
It is a well known fact that many experienced Japanese speakers of L2 English eventually
stop vowel epenthesis and become able to produce English consonant clusters. The results
of the English survey show that even advanced speakers of L2 English (AE1, AE2 and
AE3) are still not aware of English syllable structures and under a strong influence from L1
Japanese syllable structure and mora-based segmentation.
A similar discrepancy is found between the phonological awareness of L2 English
syllable structure and the production of a duration contrast between English tense vs. lax
vowels. One of the advanced speakers of L2 English in the English survey, AE2, also
participated in Experiment 4, in which we investigated the production of English tense and
lax vowels by Japanese speakers. In the English survey, AE2’s segmentation judgments
showed the influence of L1 Japanese characteristics as strongly as for beginning speakers
of L2 English. If the phonological awareness of English syllable structures develops
together with the production of English tense and lax vowels in L2 English development,
we expect that Speaker AE2 will produce a duration ratio of English tense/lax vowels
similar to the duration ratio of his L1 Japanese long/short vowels. However, this was not
the case. AE2 successfully approximated the native-like duration ratio of English tense/lax
163
vowels (see Figure 3-6 in Chapter 3), even though he still parsed Japanese long vowels as
bimoraic in the English survey (see Table 5-5).
These discrepancies between the production of English segments and the phonological
awareness of English syllable structures in L2 English–L1 Japanese are interesting, since
they suggest that Japanese speakers are able to suppress L1 Japanese segmental patterns
and can produce native-English-like segments even though they still retain Japanese moraic
structures in their mind.
L2 Japanese–L1 English: phonological awareness of Japanese moras and
production of the duration contrast between Japanese long vs. short vowels
In Experiment 5, the duration contrast between Japanese long vs. short vowels in L2
Japanese production by English speakers was investigated. The results of this survey
show that English speakers choose the spectral region of English tense /i/ and differentiate
long and short vowels only by duration. Is there any systematic relation between duration
contrast patterns and the awareness of the bimoraicity of Japanese long vowels? In order to
answer this question, we summarized the judgments on the target words that were used in
Experiment 5 and also in the Japanese survey presented in this chapter in Table 5-8.
Table 5-8: Number of /no/ (representing the segmentation of Japanese words
containing short vs. long vowels)
L1
L2 Japanese
Japanese A J 1 A J 1 A J 3 B J 1 B J 2 B J 3 B J 4
biru
biiru
chizu
chiizu
kado
kaado
toru
tooru
CVCV
CVRCV
CVCV
CVRCV
CVCV
CVRCV
CVCV
CVRCV
2
3
2
3
2
3
2
3
2
3
2
3
2
3
2
3
164
2
3
2
3
2
3
2
3
2
3
2
3
2
3
2
3
2
3
2
3
2
3
2
3
2
2
2
2
2
2
2
2
2
3
2
3
2
3
2
2
2
2
2
2
2
2
2
2
To review the duration contrasts between Japanese long and short vowels which emerged
from the results of Experiment 5, the average duration ratio of long/short vowels is
summarized in Table 5-9, for three word pairs for each speaker. For L1 Japanese data, the
ratio range of three speakers is presented. In both tables, non-native patterns are indicated
by shaded cells.
Table 5-9: Average duration ratios of Japanese long/short vowels
(based on the results of Experiment 4)
L1
L2 Japanese
Japanese A J 1 A J 1 A J 3 B J 1 B J 2 B J 3 B J 4
biru vs. biiru
kado vs. kaado
toru vs. tooru
1.6~2.3
2.1~2.5
2.0~2.4
2.15
2
2.4
2.9
2.9
2.8
3.0
4.0
3.4
2.5
2.3
2.1
2.6
2.7
2.8
1.8
1.7
1.7
2.7
1.7
2.9
Table 5-10 summarizes the performance of each L2 Japanese speaker with respect to
the following two aspects (based on the information presented in Tables 5-8 and 5-9,
respectively): the segmentation of Japanese long vowels and the duration contrast between
Japanese long vs. short vowels in his/her L2 Japanese production, as observed in
Experiment 5.
Table 5-10: Average duration ratios of Japanese long/short vowels
AJ1
AJ2
AJ3
BJ1
BJ2
BJ3
BJ4
segmentation of
Japanese long vowels
long-short
duration contrast
mora-based
mora-based
mora-based
mora-based
syllable-based
mora-/syllable-based
syllable-based
native-like
too large
too large
native-like
too large
too small
too large or too small
165
In Table 5-10, we observe three patterns in terms of the distribution of native-like and nonnative-like patterns.
AJ1 and BJ1 produced native-like patterns for both word
segmentation and duration contrasts; AJ2 and AJ3 showed native-like word segmentation
but excessively large duration contrasts; finally, BJ2, BJ3 and BJ4 showed neither nativelike segmentation nor duration contrasts. Thus, the only two speakers showing native-like
duration contrasts for Japanese long/short vowels treated Japanese long vowels as
bimoraic, while none of the three speakers of L2 Japanese who did not show mora-based
segmentation (BJ2, BJ3 and BJ4) showed native-like duration contrasts. This suggests
that the ability of segmenting Japanese long vowels as bimoraic is correlated with the ability
to produce a native-like duration contrast between Japanese long and short vowels.
5.6. Summary
In order to find how L2 speakers are aware of the syllable structures of their target
language, we conducted a phonological survey for English and Japanese, in which
participants were asked to segment words with the use of the segmentation unit /no/. The
expected native pattern was observed in L1 speakers’ judgments on word segmentation: L1
English speakers treated monosyllabic English words as one syllable, while L1 Japanese
speakers treated Japanese moraic segments as individual moras. Two major patterns were
found in L2 English speakers’ judgments: first, the decomposition of one English syllable
into multiple moras/syllables; second, the use of the mora for English word segmentation.
These two patterns indicate that the characteristics of L1 Japanese syllable structure
strongly affect the awareness of English syllable structures even by advanced Japanese
speakers of English.
On the other hand, the L2 Japanese data showed that English
speakers become aware of Japanese moraic segments (i.e., of the bimoraicity of Japanese
complex syllables), and there is a positive correlation between proficiency levels and the
phonological awareness of Japanese moraicity. The asymmetry between L2 English and
166
L2 Japanese in the ability of becoming aware of the syllable structures of the target
language can be explained by the hypothesis that learning a new prosodic category (the
mora for English speakers learning Japanese) is less challenging than ignoring/suppressing
a category existing in L1 (the mora for Japanese learning English). Another possible factor
which could be causing this asymmetry is the difference between English and Japanese in
terms of how they represent syllable structures in the writing system. The Japanese kana
system, which is mora-based, may help English speakers in becoming aware of morabased parsing, while the English writing system does not offer an equivalent visual cue to
Japanese speakers learning English in order to learn the phonological awareness of English
syllable structures.
167
Chapter 6: Conclusion
In the present study I conducted seven phonetic experiments and a phonological experiment
in order to investigate how L1 prosodic characteristics affect production of L2 prosodic
patterns. I compared L2 English–L1 Japanese and L2 Japanese–L1 English. While the
significance of the specific results of each experiment was discussed in previous chapters, I
will conclude with some general remarks about aspects of the study of L2 prosody which
emerge from the reported work, and I will indicate some directions that future research
could explore.
First, past research on L2 speech development at the segmental level has shown that it
is important to take phonetic details into account in order to achieve a better understanding
of language transfer in L2 speech development (e.g., Brière 1968 or Flege 1987). The
present study has demonstrated that this is also the case for research on the prosodic level.
Second, prosody is phonetically realized by multiple correlates, which differ from
language to language. The present study has shown that it is important to investigate
various correlates relevant to the prosodic phenomenon of interest in L2, since transfer
patterns can vary greatly from correlate to correlate. Future research should analyze the
entire set of relevant correlates for a comprehensive picture of L2 prosodic patterns (e.g.,
for accent, intensity and vowel quality should be included in addition to F0 and duration,
the correlates that were examined in the present study).
Third, it is essential to consider whether a certain prosodic feature plays a phonological
role just in one language, in both languages, or in neither language. The present study has
shown that different transfer patterns in the learner’s production can be explained by a
168
difference between L1 and L2 in terms of the phonological status of a relevant prosodic
feature.
Fourth, the analysis of contrast between English tense vs. lax vowels and between
Japanese long vs. short vowels in L2 speech production has shown that there exists a
systematic interaction of prosodic and segmental levels in the transfer of L1 features in L2
speech development.
Fifth, the comparison of the data from the phonological experiment with the results of
the phonetic experiments has shown that L2 phonological prosodic units are not necessarily
learned in parallel with the L2 phonetic patterns of prosodic phenomena.
We have
observed two types of interactions between these two aspects of L2 speech.
In some
cases, L2 phonetic patterns can be learned while retaining phonological awareness based on
L1 prosodic units (e.g., advanced Japanese speakers of L2 English produce a nativeEnglish-like duration contrast between English tense and lax vowels successfully even
though they still base their phonological segmentation of English words on Japanese
moraic structures). In other cases, phonological awareness of L2 prosodic units appears to
be correlated with (required for?) the acquisition of L2 phonetic patterns (e.g., all and only
English speakers showing native-Japanese-like duration contrasts between Japanese long
and short vowels also showed native-Japanese-like awareness of Japanese moras).
Finally, the investigation of multiple prosodic phenomena in the present study suggests
that an L2 speaker’s prosodic system does not necessarily develop in a parallel manner for
different dimensions of prosody. Some beginning learners are bound to L1 phonetic habits
in the production of a certain prosodic phenomenon, but at the same time they show
surprisingly native-like L2 patterns in the production of another prosodic phenomenon.
Which factors affect the learner’s proficiency in any one prosodic dimension still needs to
be clarified. Among the potential factors we hypothesized are the phonological status of a
phonetic pattern, the relative importance of the role of that correlate in the production of that
169
specific prosodic phenomenon, the learner’s perceptual sensitivity, and the type and
amount of L2 input. To assess the relevance and relative importance of each factor is, of
course, a great challenge for future research.
In the long term, it will be important for L2 prosody research to describe the L2
prosodic system of a speaker comprehensively by considering both tonal and temporal
organization, and to collect this sort of data for more L2 speakers. Furthermore, we need
to take the interaction between the segmental and prosodic level and the interaction between
the phonological and phonetic level into account in the study of language transfer in L2
speech development.
These efforts will not only help us in achieving a better
understanding of L2 speech development, but also in establishing a better model of L2
speech development and accurate methods for diagnosing L2 oral proficiency.
To conclude, while of course many questions are still open and a considerable amount
of further research has to be conducted on the relevant issues, the work reported here
shows that the experimental study of L2 prosody can provide interesting insights into
language transfer in L2 speech development.
170
REFERENCES
Anderson-Hsieh, J. & H. Venkatagiri. (1994). Syllable duration and pausing in the speech
of Chinese ESL speakers. TESOL Quarterly 28 (4), 807-812.
Argyres, Z. (1996). The Cross-cultural Pragmatics of Intonation: the Case of GreekEnglish. MA thesis, UCLA.
Beckman, M. E. (1982). Segment duration and the ‘mora’ in Japanese. Phonetica 39, 113135.
Beckman, M. E. (1986). Stress and Non-Stress Accent. Dortdrecht: Foris Publications.
Beckman, M. E. (1992). Evidence for speech rhythms across languages. In Y. Tohkura,
E. Vatikiotis-Bateson, & Y. Sagisaka (eds.), Speech Perception, Production and
Linguistic Structure. Tokyo: Ohmsha.
Beckman, M. E. & J. B. Pierrehumbert. (1986). Intonational structure in Japanese and
English. Phonology Yearbook Vol. 3, pp. 255-309.
Beckman, M. E. (1996). The parsing of prosody. Language and Cognitive Processes (to
appear).
Beckman, M. E. & J. Edwards. (1990). Lengthenings and shortenings and the nature of
prosodic constituency. In J. Kingston & M. E. Beckman (eds.), Papers in Laboratory
Phonology I: Between the Grammar and the Physics of Speech. Cambridge:
Cambridge University Press.
Berkovits, R. (1993). Utterance-final lengthening and the duration of final-stop closures.
Journal of Phonetics 21, 479-489.
Best, C. (1995). A direct realist view of cross-language speech perception. In W. Strange
(ed.),Speech perception and linguistic experience: issues in cross-linguistic experience:
issues in cross-language research. Baltimore: York Press.
Bohn, O.-S. & J. E. Flege. (1990). Interlingual identification and the role of foreign
language experience in L2 vowel perception. Applied Psycholinguistics 11, 131-58.
Bloch, B. (1950). Studies in colloquial Japanese IV: Phonemics. Language 26, 86-125.
Bradlow, A., R. Port & K. Tajima. (1995). The combined effects of prosodic variation on
Japanese mora timing. Proceedings of International Congress of Phonetic Sciences 4,
344-347.
Broselow, E. (1983). Non-obvious transfer: On predicting epenthesis errors. In S. Gass &
L. Selinker (eds.), Language Transfer in language learning, pp. 269-280. Rowley,
MA: Newbury House.
171
Broselow, E. & H.-B. Park. (1995). Mora conservation in second language prosody. In J.
Archibald (ed.), Phonological Acquisition and Phonological Theory. Hillsdale, NJ:
Lawrence Erlbaum.
Cambier-Langeveld, T. (2000). Temporal Marking of Accents and Boundaries. Ph.D.
dissertation, University of Amsterdam.
Campbell, N. (1991). A study of Japanese speech timing from the syllable perspective.
Phonetic Society of Japan 3, 29-39.
Campbell, N. (1992). Segmental elasticity and timing in Japanese speech. In Y. Tohkura,
E. Vatikiotis-Bateson, & Y. Sagisaka (eds.), Speech Perception, Production and
Linguistic Structure. Tokyo: Ohmsha.
Campbell, N. & Y. Sagisaka. (1991). Moraic and syllable-level effects on speech timing.
(Onsei taimingu ni mrareru moora to onsetsu no eikyou ni tsuite). Translation.
Committee for Speech Research, Acoustically Society of Japan SP90-107, 35-40.
Class, A. (1939). The Rhythm of English Prose. Oxford: Basil Blackwell.
Dauer, R. (1983). Stress-timing and syllable-timing reanalyzed. Journal of Phonetics 11,
51-62.
Eckman, F. (1977). Markedness and the contrastive analysis hypothesis. Language
Learning 27, 315-30.
Eckman, F. (1985). The markedness differential hypothesis: theory and applications. In B.
Wheatley, A. Hastings, F. Eckman, L. Bell, G. Krukar & R. Rutkowski (eds.),
Current Approaches to Second Language Acquisition: Proceedings of the 1984
University of Wisconsin-Milwaukee Linguistic Symposium, 3-21. Bloomington, ID:
Indiana University Linguistics Club.
Ellis, R. (1985). Understanding Second Language Acquisition. Oxford University Press.
Fant, G., A. Kruckenberg & L. Nord (1991). Durational correlates of stress in Swedish,
French and English. Journal of Phonetics 19, 351-365.
Fear, B., A. Cutler & S. Butterfield (1995). The strong/weak syllable distinction in
English. Journal of the Acoustical Society of America 97 (3), 1893-1904.
Ferreira, F. (1993). Creation of prosody during sentence production. Psychological
Review 100 (n2), 233-253.
Flege, J. E. (1987). The production of “new” and “similar” phones in a foreign language:
evidence for the effect of equivalence classification. Journal of Phonetics 15, 47-65.
Flege, J. E. (1988). The production and pronunciation of foreign language speech sounds.
In H. Winitz (ed.), Human Communication and Its Disorders: A Review 2. Norwood,
NJ.: Ablex Publishing Corp.
172
Flege, J. E. (1989). An instrumental study of vowel reduction and stress placement in
Spanish-accented English. Studies in Second Language Acquisition 11, 35-62.
Flege, J. E. (1992). Speech learning in a second language. In C.A. Ferguson, L. Menn &
C. Stoel-Gammon (eds.), Phonological Development: Models, Research Implications,
Timonium, Marland: York Press.
Flege, J. E. (1995). Second-language speech learning: Theory, findings, and problems. In
W. Strange (ed.), Speech Perception and Linguistic Experience: Issues in CrossLanguage Research. Timonium, Maryland: York Press.
Fougeron, C. (1999). Prosodically conditioned articulatory variations: A review. UCLA
Working Papers in Phonetics 97, 1-73.
Fougeron C. & P. Keating. (1997). Articulatory strengthening at edges of prosodic
domain. Journal of the Acoustical Society of America 106(6), 3728-3740.
Fries, C. (1945). Teaching and learning English as a foreign language. Ann Arbor:
University of Michigan Press.
Fry, D. B. (1955). Duration and intensity as physical correlates of linguistic stress. Journal
of the Acoustic Society of America 27, 765-768.
Fry. D. B. (1958). Experiments in the perception of stress. Language and Speech 1, 126152.
Gårding, E. (1981). Contrastive prosody: A model and its application. Studia Lingusitica
35, 146-165.
Gass, S. (1996). The role of language transfer. In C. W. Ritchie & T. K. Bhatia (eds.),
Handbook of Second Language Acquisition. San Diego: Academic Press.
Gass, S. & L. Selinker. (1994). Second Language Acquisition: An Introductory Course.
Lawrence Erlbaum Associates.
Gee, J. P. & F. Grojean. (1983). Performance structures: A psycholinguistic and linguistic
appraisal. Cognitive Psychology 15, 411-458.
Gordon, M. (1999). Stress and other Weight-Sensitive Phenomena: Phonetics, Phonology
and Typology. Ph.D. dissertation, UCLA.
Han, M. (1961). Japanese Phonology: An Analysis Based on Sound Spectrograms. Ph.D.
dissertation, University of Texas, Austin.
Han, M. (1962). The feature of duration in Japanese. Onsei no Kenkyuu 10, 65-80.
Han, M. (1962). Japanese Phonology: An Analysis Based upon Sound Spectrograms.
Tokyo: Kenkyusha.
Han, M. (1992). The timing control of geminate and single stop consonants in Japanese: A
challenge for nonnative speakers. Phonetica 49, 102-127.
173
Han, M. (1994). Acoustic manifestations of mora timing in Japanese. Journal of Acoustical
Society of America 96, 73-82.
Haugen, E. (1956). Bilingualism in the Americas: A Bibliography and Research Guide.
Baltiore: American Dialect Society.
Hayes, B. (1989). Compensatory lengthening in moraic phonology. Linguistic Inquiry
20, 253-206.
Hayes, B. (1984). Review Article: D. Attridge, The Rhythms of English Poetry. Language
60, 914-923.
Hayes, B. (1989). The Prosodic Hierarchy in Meter. In P. Kiparsky & G. Youmans,
Phonetics and Phonology, Volume 1: Rhythm and Meter. San Diego: Academic
Press.
Hayes, B. (1995). Metrical Stress Theory. The University of Chicago Press.
Hoequist, C, Jr. (1983a). Durational correlates of linguistic rhythm categories. Phonetica
40, 19-31.
Hoequist, C, Jr. (1983b). Syllable duration in stress-, syllable-, and mora-timed
languages. Phonetica 40, 203-237.
Homma, Y. (1973). An acoustic study of Japanese vowels. Study of Sounds 16, 347-368.
Homma, Y. (1981). Durational relationship between Japanese stops and vowels. Journal
of Phonetics 9, 273-281.
Huss, V. (1978). English word stress in the postnuclear position. Phonetica 35, 86-105.
Inagaki, K., G. Hatano & T. Otake. (2000). The effect of kana literacy acquisition on the
speech segmentation unit used by Japanese young children. Journal of Experimental
Child Psychology 75, 70-91.
Ito, J. (1990). Prosodic minimality in Japanese. In K. Deaton, M. Noske & M.
Ziolkowski (eds.), CLS 26-II: Papers from the Parasession on the Syllable in
Phonetics and Phonology, 213-239.
Jinbo, K. (1927). Kokugo no onseijou no tokushitsu [The top phonetic characteristics of
Japanese]. In T. Shibata, Kitamura, H. Kindaichi (eds.), Nihon no gengogaku
[Linguistics of Japan], pp. 5-15. Tokyo: Taishukan (reprinted in 1980).
Jun, S.-A. (1993). The Phonetics and Phonology of Korean Intonation. Ph.D.
dissertation, Ohio State University. [published in 1996 by Garland, NY.]
Jun, S.-A. (1995). A phonetic study of stress in Korean. Posters presented in the 130th
meeting of the Acoustical Society of America, St. Louis, MO.
174
Jun, S.-A. (1998). The Accentual Phrase in the Korean prosodic hierarchy. Phonology 15
(2), 89-226.
Jun, S.-A. & M. Oh. (2000). Acquisition of Second Language Intonation. Proceedings of
the International Conference on Spoken Language Processing, Beijing, China.
Kaiki, N., Takeda, K., Sagisaka, Y., Katagiri, S., Umeda, T. and Kuwabara, H. (1990).
A large-scale Japanese speech data base. Proceedings of the International Conference
on Spoken Language Processing, 1.5, 17-20.
Kaiki, N. & Y. Sagisaka. (1992). The control of segmental duration in speech synthesis
using statistical methods. In Y. Tohkura, E. Vatikiotis-Bateson, & Y. Sagisaka (eds.),
Speech Perception, Production and Linguistic Structure. Tokyo: Ohmsha.
Kawasaki, H. (1983). Models and data on the temporal regulation of speech: Isochrony in
Japanese and English. Journal of the Acoustical Society of Japan 39 (6), 389-397. (in
Japanese).
Keating, P., T. Cho, C. Fougeron & C.-S. Hsu. (to appear). Domain-initial articulatory
strengthening in four languages. Papers in Laboratory Phonology 6.
Klatt, D. (1975). Vowel lengthening is syntactically determined in a connected discourse.
Journal of Phonetics 3, 129-140. Cambridge: Cambridge University Press.
Klatt, D. (1980). Software for a cascade/parallel formant synthesizer. Journal of the
Acoustical Society of America 67, 971-95.
Kondo, Y. (1998). Prosodic constraint on V-to-V Coarticulation in Japanese. Proceedings
of the International Conference on Spoken Language Processing, Sydney.
Kubozono, H. (1989). The mora and syllable structure in Japanese: Evidence from speech
errors. Language and Speech 32(3), 249-278.
Kubozono, H. (1993). The Organization of Japanese Prosody. Tokyo: Kuroshio
Publishers.
Kubozono, H. (1995). Gokeisei to Oninkoozoo [Word Formation and Phonological
Structure]. Tokyo: Kuroshio Shuppan.
Kubozono, H. (1995). Perceptual evidence for the mora in Japanese. In Connel, B. & A.
Arvaniti (eds.), Phonology and Phonetic Evidence: Papers in Laboratory Phonology
IV. Cambridge University Press.
Kubozono, H. & M. Nakau (1998). Oninkoozoo to akusento [Phonological Structure and
Accent]. Tokyo: Kenkyusha.
Ingram, J. C. & S.-G. Park. (1997). Cross-language vowel perception and production by
Japanese and Korean learners of English. Journal of Phonetics 25(3), 343-370.
175
Ladd, R. (1996). Intonational Phonology. Cambridge University Press.
Ladefoged, P. (1993). A Course in Phonetics. 3rd Edition. Orlando, FL: Harcourt Brace &
Company.
Lado, R. (1957). Linguistics across cultures. Ann Arbor: University of Michigan Press.
Larsen-Freeman, D. & M. Long. (1991). An Introduction to Second Language
Acquisition. Longman.
Leather, J. & A. James (1996). The acquisition of second language speech. In C. W.
Ritchie & T. K. Bhatia (eds.), Handbook of Second Language Acquisition. San Diego:
Academic Press.
Lehiste, I. (1970). Suprasegmentals. Cambridge, MA: MIT Press.
Lehiste, I. (1977). Isochrony reconsidered. Journal of Phonetics 5, 253-263.
Lehiste, I. & G. E. Peterson (1959). Vowel amplitude and phonemic stress in American
English. Journal of the Acoustical Society of America 31, 428-435.
Levitt, A. (1992). Reiterant speech as a test of non-native speakers’ mastery of the timing
of French. Journal of the Acoustical Society of America 90 (6), 3008-3018.
Liberman, M. (1975). The Intonational System of English. Ph.D. dissertation, MIT.
Liberman, M. & A. Prince. (1977). On stress and linguistic rhythm. Linguistic Inquiry 8,
249-336.
Maekawa, K. (1994). Is there ‘dephrasing’ of the accentual phrase in Japanese? Ohio State
University Working Papers in Linguistics: Papers from the Linguistic Laboratory 44,
146-165.
Magnuson, J. S. & R. Akahane-Yamada. (1996). Acoustic correlates to the effects of
talker variability on the perception of English /r/ and /l/ by Japanese listeners. In
Proceedings of the Fourth International Congress on Spoken Language Processing
(ICSLP 96). University of Delaware and A. I. du Point Institute.
Mester, A. (1990). Patterns of trunctation. Linguistic Inquiry (21)3, 278-485.
Mitsuya, F. & M. Sugito (1977). A study of the accentual effect on segmental and moraic
duration in Japanese. Annual Bulletin of the Research Institute of Logopedics and
Phoniatrics, University of Tokyo 12, 97-112.
Mochizuki-Sudo, M. & S. Kiritani. (1991). Production and perception of stress-related
durational patterns in Japanese learners of English. Journal of Phonetics 19, 231-248.
Nakatani, L. H., K. D. O'Connor & C. H. Aston (1981). Prosodic aspects of American
English rhythm. Phonetica 38, 84-106.
176
Nespor, M. & I. Vogel. (1986). Prosodic Phonology. Dordrecht: Foris Publications.
Otake, T. Gengo no rizumu to onsetsu koozoo [Rhythmic structure of Japanese and
syllable structure]. IEICE Tech. Report 89, 55-61.
Peterson, G. E. & H. L. Barney. (1952). Control methods used in a study of vowels.
Journal of the Acoustical Society of America 24, 175-184.
Peterson, G. E. & I. Lehiste. (1960). Duration of Syllable Nuclei in English. Journal of
the Acoustical Society of America 32, 693-703.
Pierrehumbert, J. (1980). The Phonology and Phonetics of English Intonation, Ph.D.
dissertation, MIT. [published 1987 by IULC, Bloomington: Indiana University
Linguistics Club.]
Poser, W. (1990). Evidence for foot structure in Japanese. Language 66, 78-105.
Sagisaka, Y. (1999a). Nihongo onin no jikanchoo seigyo to chikaku [Japanese sound
duration control and perception]. Gengo, 51-56.
Sagisaka, Y. (1999b). Koopasu beesu onsei goosei: Onsei kagaku chishiki ni motozuku
goosei shisutemu koochiku gijitsu no shin paradaimu [Corpus-based speech synthesis
– A new paradigm for synthesis system building based on the knowledge in speech
science]. Proceedings of Meeting of the Acoustical Society of Japan,
September–October, 197-200.
Sato, Y. (1993). The durations of syllable-final nasals and the mora hypothesis in
Japanese. Phonetica 50, 44-67.
Sato, Y. (1995). The mora timing in Japanese: A positive linear correlation between the
syllable count and word duration. Bulletin of Phonetic Society of Japan 209, 40-53.
Sato, Y. (1996). The moraic status of syllable-final nasals in Japanese. Bulletin of Phonetic
Society of Japan 212, 67-75.
Selinker, L. (1972). Interlanguage. International Review of Applied Linguistics 10, 20930.
Selkirk, E. (1983). Phonology and Syntax: The Relation between Sound and Structure.
Cambridge, MA: The MIT Press.
Selkirk, E. (1986). On derived domains in sentence phonology. Phonology Yearbook 3,
371-405.
Selkirk, E. (to appear). The prosodic structure of function words. In J. Martin & K.
Demuth (eds), International Conference on Bootstrapping from Speech to Grammar in
Early Acquisition, Brown University. Hillsdale, N.J.: Lawrence Earlbaum.
Shattuck-Hufnagel, S. & A. Turk. (1996). A prosody tutorial for investigators of auditory
sentence processing. Journal of Psycholinguistic Research 25(2), 193-247.
177
Shibatani, M. (1990). Languages of Japan. Cambridge University Press.
Shibuya, Y. (1997). Differences Between Native and Non-native Speakers’ Realization of
Stress-Related Duration and Pitch Patterns in American English. A qualifying paper in
Georgetown University Linguistics Department.
Strange, W., O.-S. Bohn, S. A. Trent, M. C. McNair & K. C. Bielec. Context and
speaker effects in the perceptual assimilation of German vowels by American listeners.
In Proceedings of the Fourth International Congress on Spoken Language Processing
(ICSLP 96). University of Delaware and A. I. du Pont Institute.
Sugito, M. (1982a). Nihongo Akusento no Kenkyu, Tokyo: Sanseido Shuppan.
Sugito, M. (1982b). Eibeijin oyobi nihonjin no hatsuwa ni okeru [I] oyobi [i] no
onkyooteki tokuchyoo. [republished in Suguto, M. Nihonjin no Eigo (1996). Osaka:
Izumi Shoin.
Sugito, M. (1996). Nihonjin no Eigo. Tokyo: Izumi Shuppan.
Takeda, K., Sagisaka, Y., and Kuwabara, H. (1989). On sentence-level factors governing
segmental duration in Japanese. Journal of the Acoustical Society of America 86, 20812087.
Tateishi, K. (1989). Theoretical implications of the Japanese musicians’ language.
WCCFL 8, 384-398.
Todaka, Y. (1990). An Error Analysis of Japanese Students’ Intonation and Its Prosodic
Analysis. MA thesis, UCLA.
Trask, R. L. (1996). A Dictionary of Phonetics and Phonology. London: Routledge.
Trubetzkoy, N. S. (1958). Grundzüge der Phonologie. Göttingen: Vanddenhoeck and
Rupercht.
Tsujimura, N. (1996). An Introduction to Japanese Linguistics. Oxford: Blackwell.
Turk, A. & J. R. Sawush. The domain of accentual lengthening in American English.
Journal of Phonetics 25(1), 25-42.
Uchida, T. (1996). Chuugokujin Nihongo Gakushuusha ni Okeru Chyooon Sokuon
Hatsuon no Chyookakutekininchi no Tokuchyoo: Gaikokujin no tame no Nihongo
Onseikyooiku ni Okeru Tokushyuhaku no Mondai ni Kansuru Chyookakuteki Kiso
Kenkyuu. Ph.D. dissertation, Nagoya University.
Ueyama, M. (1995). Phrase-final Lengthening and Stress-timed Shortening Effects in the
speech of Native Speakers and Japanese learners of English. Unpublished MA thesis,
UCLA.
Ueyama, M. (1996). Phrase-final Lengthening and Stress-timed Shortening in the speech
of Native Speakers and Japanese learners of English. Proceedings of the International
Conference on Spoken Language Processing, Philadelphia.
178
Ueyama, M. & S.-A. Jun. (1998). Focus realization in Japanese English and Korean
English intonation. Japanese/Korean Linguistics Vol. 7, CSLI/Stanford University
Press.
Ueyama, M. (1999). An experimental study of vowel duration in phrase-final contexts in
Japanese. UCLA Working Papers in Phonetics 97, 174-182
Umeda, N. (1975). Vowel duration in American English. Journal of the Acoustical Society
of America 58, 434-45.
van Santen, J. P. H. (1992). Contextual effects on vowel duration. Speech
Communication 11, 513-546.
Vance, T. (1987). An Introduction to Japanese Phonology. SUNY Press.
Venditti, J. (1995). Japanese ToBI Labeling Guidelines. Manuscript, Ohio-State
University.
Venditti, J. (2000). Discourse Structure and Attentional Salience Effects on Japanese
Intonation. Ph.D. dissertation, Ohio State University.
Venditti, J. & J. van Santen. (1998). Modeling segmental durations for Japanese text-tospeech synthesis. In Proceedings of the 3rd ESCA TTS Workshop, Jenolan Caues,
Australia (CD-Rom version).
Vihman, M. M. (1996). Phonological Development: The Origins of Language in the Child.
Oxford: Blackwell.
Warner, N. & T. Arai. (submitted for a review). Japanese mora-timing: A review.
Watanabe, K. (1987). Sentence stress perception by Japanese students. Journal of
Phonetics 16, 181-186.
Weinreich, U. (1953). Languages in contact. The Hague: Mouton.
Weitzman, R. (1969). Japanese Accent: An Analysis Based on Acoustic-Phonetic Data.
Ph.D. dissertation, University of Southern California.
Wightman, C. W., Shattuck-Hufnagel, S., Ostendorf, M., and Price, P. J. (1992).
Segmental durations in the vicinity of prosodic phrase boundaries. Journal of the
Acoustical Society of America 92, 1707-17.
179