Production and Perception of Voicing Contrasts in English Word-Final Obstruents: Assessing the Effects of Experience and Starting Age Natalia Fullana1, Joan C. Mora2 1 2 University of Ottawa, Canada Universitat de Barcelona, Spain [email protected], [email protected] 1. Introduction Research into the acquisition of the English sound system as a second language (L2) and/or foreign language (FL) has shown that Romance language native speakers have difficulty in perceiving and producing English consonant voicing in word-final position. Such difficulty has frequently been ascribed either to the lack of word-final consonants in the learners’ first language (L1) (e.g., Spanish and Italian) or to the non-occurrence of voiced consonants – particularly stops – in absolute word-final position (e.g., Catalan). Consequently, Romance language speakers often resort to L1-based production strategies. Thus, they have been found to devoice voiced obstruents in word-final position (Cebrian, 2000; Flege & Davidian, 1984; Flege, Munro, & Skelton, 1992; see also Yava, 1994), and to a lesser degree, to spirantize and delete those sounds (Flege & Davidian, 1984). Failure to perceive and produce the consonant voicing contrast in a native-like manner on the part of Romance language learners of English has further been associated with age of onset of L2 learning and experience in the target language (TL). For instance, Flege, Munro, and MacKay (1995b) examined the production of English /p/-/b/, /t/-/d/, and /k/-/g/ contrasts by Italian L2 learners with an average length of residence (LOR) in Canada of 32 years. It was observed that native Italian speakers with a starting age of L2 learning (AOL) of 3 to 13 years produced English final /t/-/d/ tokens accurately, whereas those participants with an AOL of 15 to 21 years did not succeed in producing the /t/-/d/ contrast at native-like rates. Moreover, late L2 learners’ performance on English /p/-/b/ and /k/-/g/ was significantly different from that of English native speakers (NSs) (learners’ AOLs of 19 to 21 years and 17 to 21 years, respectively). It should also be mentioned that the authors commented on the fact that several individual late L2 learners produced the voicing contrast between /p/-/b/ and /k//g/ within the English native speaker range. In spite of the commonly reported late L2 learners’ nonnative-like perception and production of English consonant voicing contrasts, several studies have indicated that late Romance language learners of English might enhance their perceptual and production skills as a result of greater experience in the TL. Thus, adult Spanish L2 learners with 9 years of experience (or exposure) in English in an immersion setting approximated the English-like production of /t/-/d/ in word-final CVC context, as opposed to Spanish late L2 learners with a significantly lower amount of experience in English – 0.4 years (Flege et al., 1992). In the case of FL learning settings, Catalan/Spanish bilingual advanced learners of English discerned voicing contrasts – i.e., /t/-/d/, /s/-/z/, and /t/-/d/ – at significantly higher correct rates after an 80-hour period of formal instruction at their home university, whereas additional exposure by means of a 3-month stay abroad period in English-speaking countries did not contribute to a significantly consistent better discrimination rate of English voicing contrasts (Mora, 2007). A rather precise account of these findings concerning differential experience outcomes and, above all, starting age effects on Romance language NSs’ perception and production of English voicing contrasts in word-final position lies in Flege’s (1995) Speech Learning Model (SLM). Based on the assumption that “the phonetic systems used in the production and perception of vowels and consonants remain adaptive over the life span, and that phonetic systems reorganize in response to sounds encountered in an L2 through the addition of new phonetic categories, or through the modification of old ones” (Flege, 1995, p. 233), the SLM makes predictions about the perception and production of L2 vowels and consonants. As for L2 consonant production and perception, the earlier learners start acquiring the TL, the more likely they will be to detect phonetic differences between L1 and L2 consonant segments, which then might lead to the establishment of additional phonetic categories for L2 consonants. However, in those instances where the phonetic distance between an L1 consonant and an L2 consonant is perceived to be smaller – by means of equivalence classification (Flege, 1987) – L2 learners appear to produce the TL sounds with intermediate values between the typical values of L1 sounds and those of L2 sounds (e.g., VOT in Flege, 1987; Flege & Eefting, 1987). Similarly, the SLM also hypothesizes that learners who lack L2 consonant sounds in certain allophonic positions in their L1 sound inventory will also be able to establish new phonetic categories for those sounds. Therefore, it is predicted that learners will perceive and produce L2 sounds accurately. Yet various studies have revealed that, in spite of lacking voiced stops in word-final position, native Spanish and Italian late learners, among others, failed to produce English voiced stops in word-final position (e.g., Flege et al., 1992, for Spanish; Flege et al., 1995, for Italian). Moreover, and unlike predictions of the SLM about improved discernment and production of L2 sounds as a function of experience, the various amounts of experience of Spanish NSs and long-term exposure of native Italian participants did not lead to a more accurate production of the distinctive feature of voicing. Taking all of the above into account, the present study aimed to further investigate the perception and production of voicing contrasts in English word-final obstruents /s/-/z/, /p/-/b/, and /t/-/d/ by Romance language speakers, namely Catalan/Spanish advanced learners of English in a formal learning setting. In addition, this study examined the effects of starting age of FL learning (before vs. after 8 years) and experience in English (amount of formal instruction and extra curricular exposure at home and/or overseas) on the perception and production of English voicing contrasts. 2. Method 2.1 Participants The participants in the present study were 48 non-native speakers of English (NNSs) learning English as a foreign language in a formal instruction context. They were bilingual speakers of Catalan and Spanish studying at university in Barcelona (33 female, 15 male) with a mean age of 21.2 years and an advanced level of competence in English. All of them had learnt English mainly through formal classroom instruction in Catalonia, but differed in starting age of FL learning – i.e., onset age at or before 8 vs. onset age after 8 – and in amount of experience in the L2 – i.e., school exposure only vs. extra exposure throughout schooling and/or through short (< 3 months) stay-abroad periods (with or without a formal instruction component). A group of 4 native speakers of Standard Southern British English (SSBE) (1 female, 3 male) produced the stimuli for the construction of the perception and production tasks in the present study and provided base-line data for the production task. 2.2 Materials Learners’ accuracy in vowel and consonant perception and production was assessed, respectively, through a categorial AXB discrimination task and a sentence imitation task using _____________________ New Sounds 2007: Proceedings of the Fifth International Symposium on the Acquisition of Second Language Speech - 208 - a delayed repetition technique (Flege, Munro, & MacKay, 1995a). The present paper reports on consonant data only (see Mora & Fullana, 2007, for results concerning the vowel data in the study) and, in particular, it examines learners’ ability to perceive and produce voicing contrasts in English word-final obstruents, specifically /s/-/z/, /p/-/b/, and /t/-/d/. The consonant discrimination task was a forced-choice AXB test consisting of 72 triads (24 triads per contrast) with 0.5 inter-stimulus intervals (ISI) and 1.5 inter-trial intervals (ITI) distributed into three blocks. Each block tested participants on a single consonant contrast and contained 24 fully randomized triads per contrast: 4 orders (AAB, ABB, BAA, BBA) x 2 minimal pairs per contrast (/s/-/z/: pace-pays, ice-eyes; /p/-/b/: rope-robe, tripetribe; and /t/-/d/: mate-made, sight-side) x 3 repetitions each by a different speaker (1 female, 2 male). Diphthongs were chosen as the pre-consonantal context in order to maximize the effect of consonant voicing on vowel duration. The delayed sentence repetition (DSR) task consisted of a set of 12 mini-dialogues eliciting the 6 target consonant sounds in the same words used in the perception task. The 12 elicited words (pace, pays, ice, eyes, rope, robe, tripe, tribe, mate, made, sight, side) were embedded in the carrier phrase “I said ___” that the participants imitated from a SSBE native speaker model (e.g., – What did you say? – I said ice). The three English voicing consonant contrasts in word-final position were chosen because the voicing contrast is lacking in this position at the phonetic level both in Spanish and Catalan. Thus, whereas in English word-final obstruents may range from fully voiced to completely devoiced and the preceding vowel is significantly shorter before voiceless than voiced consonants, Catalan has a rule of terminal devoicing whereby underlyingly voiced obstruents are realized as voiceless in word-final position (freda “cold” (fem) /frd /-[fr ]; cf. fred “cold” (masc) [frt]). In other word positions, however, the pairs /s/-/z/, /p/-/b/ and /t//d/ are phonologically contrastive (caça “hunting” /kas / vs. casa “house” /kaz /; pes “weight” /ps/ vs. bes “kiss” /bs/; teu “yours” /tew/ vs. deu “god” /dew/). In Spanish, /d/ (phonetically dentialveolar [d]) is the only voiced oral stop which may occur in word-final position (ciudad “city” /iwdad/), but it is typically spirantized and realized as [] ([iwa]; or as [] or [], depending on dialectal variation). Spanish lacks voiced fricative phonemes. A voiced alveolar fricative [z] is found as an allophone of /s/ before voiced consonants (e.g., mismo [mizmo] “same”, desde [deze] “from”) but never in word-final position. Consequently, Catalan/Spanish learners’ productions of English word-final /z/, /b/ and /d/ were expected to differ from that of English native speakers in two fundamental ways: (a) frication noise (for /z/) and stop closure (for /b/ and /d/) often lacking vocal cord vibration, and (b) relatively small effect of stop voicing on the duration of preceding vowels. English native speakers have been shown to rely mainly on vowel duration as a perceptual cue to the voicing feature in word-final stops and fricatives, together with closure/frication duration, voicing during closure/frication and F1 offset frequency as secondary cues. Because the effect of consonant voicing on the duration of preceding vowels is larger in English than in Catalan/Spanish, and /z/, /b/ and /d/ do not occur word-finally in Catalan/Spanish, it was expected that learners would show some perceptual difficulty in discriminating the word-final voicing contrasts /s/-/z/, /p/-/b/ and /t/-/d/. 2.3 Procedures In the AXB discrimination task, participants were asked to identify the two words in the triad with the same consonant by ticking the appropriate option in an answer sheet (A for AAB or BBA triads; B for ABB or BAA triads). A set of AXB triads with minimal pairs not included in the experimental task itself, but with the same ISI and ITI, was provided for practice. The DSR task was administered individually in a sound-proof booth. The participants heard the mini-dialogues over headphones and imitated the native speaker model’s phrase “I said __”. _____________________ New Sounds 2007: Proceedings of the Fifth International Symposium on the Acquisition of Second Language Speech - 209 - Participants’ productions were digitally recorded (at 22.05kHz, 16-bit) and computer-edited for subsequent acoustic analysis with Praat (Boersma & Weenink, 2007). A practice set was also provided at the beginning of the DSR task. 2.4 Analyses Perceptual accuracy was assessed through an analysis of participants’ mean percent correct discrimination scores on the AXB task. Accuracy in production was investigated through acoustic analyses of participants’ productions of the target word-final consonants in the DSR task, which included measures of vowel duration, consonant duration, presence/absence of voicing during friction (for /s/ and /z/) and stop closure (for /p/, /b/, /t/ and /d/), and presence/absence of a release burst (for the stop consonants). Duration measures were obtained from sound waveforms and spectrograms generated by Praat. Vowel length was obtained by placing a beginning cursor in the middle of the first peak of periodic energy indicating the onset of the vowel and an end cursor at the last peak of periodic energy that coincides with a drastic drop in energy indicating the beginning of the stop closure (or at the last peak of periodic energy just before the beginning of the high-frequency noise characteristic of /s/ and /z/). Consonant duration was measured from the beginning of the stop closure (indicated by lack of energy on spectrograms) up to the beginning of the release burst, thus measuring the duration of the closing phase of the stop rather than the duration of the whole stop (which would also include the duration of the release burst and any subsequent frication noise produced by aspiration). As the word-final consonants were elicited in a sentence-final position context, reliable consonant duration measures could only be obtained for fricatives (100%) and for those stops that were produced with a detectable release burst. The proportion of sentence final stops produced with audible release varied according to place of articulation and voicing: 49.2% of /b/s, 58.5% of /d/s (i.e., 53.85% of voiced stops on average); 52.25% of /p/s and 73.25% of /t/s (i.e., 62.75% of voiceless stops on average). Wilcoxon Signed Ranks tests were performed to explore differences in vowel and consonant duration among minimal pairs within a given contrast. The effect of experience and onset age of learning on learners’ ability to accurately perceive and produce the /s/-/z/, /p/-/b/, and /t/-/d/ consonant contrasts was explored through Kruskal-Wallis and Mann-Whitney U tests. Experience was further operationalized by means of two variables: extra exposure, which groups together participants who had not been exposed to English outside school versus those who had received extra exposure to English (mainly through regular attendance to English courses offered by private language schools); and stays abroad, which divides the pool of participants in the study into four groups according to whether they had benefited from short-term stay abroad periods (never, ≤3 months, >3 months), with or without simultaneous formal instruction (FI) abroad. Given the FL learning context of the Catalan/Spanish learners of English in the present study, starting age divides the experimental group into two age groups: onset age of FL learning at or before 8 years of age (range: 3–8, M=6.88, SD= 1.36) and after 8 (range: 9–12, M=10.84, SD= 1.01). 3. Results 3.1 AXB discrimination task The overall percent correct discrimination scores for /s/-/z/, /p/-/b/, and /t/-/d/ contrasts obtained on the AXB discrimination task were fairly similar among learners (see Table 1 and _____________________ New Sounds 2007: Proceedings of the Fifth International Symposium on the Acquisition of Second Language Speech - 210 - Figure 1 below). A Friedman test performed on those scores showed that the three consonant contrasts were indeed discerned at similar high correct rates (χ2 1.264, p = .532). Table 1. Percent correct discrimination scores on the AXB task Contrast /s/-/z/ /p/-/b/ /t/-/d/ NNSs M 93.40 92.44 92.18 SD 6.53 7.86 8.27 Even so, male participants were found to perceive the three consonant contrasts at higher correct rates than female participants (see Figure 1), resulting in significant differences in /s/-/z/ and /t/-/d/ (U 123, Z –2.855, p < .05; and U 162.5, Z –1.949, p < .05, respectively) and in near-significant differences in /p/-/b/ (U 168, Z –1.813, p = .070). /s/-/z/ 100 /p/-/b/ /t/-/d/ 98 96 94 92 90 All Female Male Figure 1. Percent discrimination scores for all participants and as a function of gender As for the factors under study, the Mann-Whitney U tests and Kruskal-Wallis analyses carried out failed to reveal any significant differences in discrimination scores according to age of onset of FL learning, amount of exposure to English throughout schooling, and stays abroad. At the most, differences in scores approached significance in the discernment of /p//b/ in favour of those participants with no extra exposure to English throughout schooling vs. those participants with some degree of extra exposure (95.23 vs. 91.14; U 150.5, Z -1.801, p = .072). Despite the lack of significant differences, the following trends were observed. Firstly, late starters tended to better discern the three voicing contrasts, in particular, /s/-/z/ and /p/-/b/ (see Figure 2). Moreover, those participants with school exposure only, as well as those without any stay-abroad period, perceived those contrasts at higher correct rates (Figure 3). Finally, those participants with a stay-abroad period longer than 3 months and with language course(s) also discriminated the voicing contrasts more accurately (Figure 4). _____________________ New Sounds 2007: Proceedings of the Fifth International Symposium on the Acquisition of Second Language Speech - 211 - /s/-/z/ 100 /p/-/b/ /t/-/d/ 98 96 94 92 90 ≤ age 8 > age 8 Figure 2. Percent discrimination scores as a function of age of onset of FL learning 100 /s/-/z/ /p/-/b/ /t/-/d/ 98 96 94 92 90 no extra exposure extra exposure Figure 3. Percent discrimination scores as a function of exposure to English /s/-/z/ 100 99 98 97 96 95 94 93 92 91 90 89 88 87 never /p/-/b/ ≤ 3 months w/out course /t/-/d/ ≤ 3 months w/course > 3 months w/course Figure 4. Percent correct discrimination scores as a function of stay-abroad periods _____________________ New Sounds 2007: Proceedings of the Fifth International Symposium on the Acquisition of Second Language Speech - 212 - 3.2 Delayed sentence repetition task The statistical tests performed on the acoustic data obtained in the production task failed to yield significant differences as a function of AOL and experience, suggesting that neither age of onset of FL learning nor AH an/or SA extra curricular exposure to English had a significant effect on learners’ ability to accurately produce the English word-final voicing contrasts examined in the present study, as measured through vowel and consonant duration and the presence/absence of glottal pulsing and a release burst in the realization of the contrasting stops. As expected, the acoustic measurements revealed that NNSs’ realization of the voicing contrasts examined differed from that of NSs’ in segmental length (see Table 2). NSs produced a longer voiceless fricative (/s/) and a shorter voiced fricative (/z/) than NNSs. A similar, less consistent, pattern was found in the duration of the closure phase of labial stops, which was longer in /p/ in rope and shorter in /b/ in robe for NSs. The duration of the stop closure, however, was always shorter for NSs than for NNSs in the labial stops in tripe and tribe and the alveolar stops /t/ and /d/. NSs, however, are very consistent in producing a large systematic difference in stop closure/friction duration between the voiced and voiceless member of the contrasting pairs of consonants. This difference in duration is much smaller and less consistent in NNSs productions (see Figure 5). In fact, the duration differences between voiced and voiceless Cs is very small (0.61ms for /p/-/b/ and 3.86ms for /t/-/d/ on average), except for the /s/-/z/ contrast (24.51ms on average) where a more substantial difference in duration is observed. These results suggest that NNSs make much less use of consonantal duration as a phonetic cue to implement the voicing contrast in /s/-/z/, /p/-/b/ and /t/-/d/. Mann-Whitney U tests revealed significant differences between NSs and NNSs in the duration difference between voiceless and voiced contexts for fricatives and stop closures (see Table 3), except for labial stops and the word pair mate-made (here the difference approached significance: p= .059 (Z= -1.912); see also Figure 3). Table 2. Segmental length differences between NNSs and NSs (SD in parentheses) Duration (ms) ice eyes /s/-/z/ pace pays rope robe /p/-/b/ tripe tribe mate made /t/-/d/ sight side Preceding Vowel NNSs NSs Fricative/Stop Closure NNSs NSs 259.45 (51.11) 208.25 (94.32) 195.24 (50.37) 270,25 (20,17) 329.52 (54.24) 422.50 (79.77) 185.98 (50.27) 153,50 (18,26) 213.91 (43.09) 155.33 (8.32) 211.70 (58.93) 249,50 (33,67) 297.43 (63.31) 338.50 (40.71) 171.91 (56.27) 150,50 (30,11) 184.07 (32.53) 154.75 (21.42) 83.56 (22.89) 89,25 (26,56) 244.94 (53.38) 293.25 (50.10) 85.57 (21.24) 57,75 (5,18) 173.74 (32.25) 235.46 (51.01) 151.00 (35.34) 285.00 (69.93) 92.50 (19.66) 89.67 (27.20) 85,00 (10,44) 57,33 (10,40) 214.75 (36.49) 167.50 (29.80) 85.88 (43.91) 64,25 (24,79) 273.09 (53.32) 297.25 (40.42) 80.64 (32.09) 41,50 (10,47) 225.21 (44.58) 195.75 (60.02) 90.53 (34.11) 71,50 (25,59) 281.38 (55.83) 326.25 (66.62) 88.04 (26.65) 43,50 (5,06) _____________________ New Sounds 2007: Proceedings of the Fifth International Symposium on the Acquisition of Second Language Speech - 213 - NNSs NSs 300 250 200 150 100 50 0 sid e ad e m tri be ro be pa ys ht ey es m t ri sig at e Voiced pe ro pe pa ce ic e Voiceless Figure 5. Fricative and stop closure duration (ms) NNSs NSs 450 400 350 300 250 200 150 100 50 0 e sid ad e m be tri ro be pa ys ey es ht sig ate Voiced m pe t ri ro pe pa ce ic e Voiceless Figure 6. Vowel duration (ms) Table 3. Stop closure duration differences before voiceless and voiced consonants according to speaker group ms. Ice-eyes pace-pays rope-robe tripe-tribe mate-made sight-side NNSs NSs Mann-Whitney U test 116.75 (14.86) U= 3.000; Z= -3.178; Sig.= .000 38.93 (56.95) 99.00 (25.62) U= 31.000; Z= -2.155 ; Sig.= .029 1.66 (39.51) 31.50 (31.67) U= 9.000; Z= -1.389 ; Sig.= .199 7.46 (16.18) 27.66 (18.77) U= 7.000; Z= -1.682 ; Sig.= .111 8.13 (48.19) 22.75 (17.28) U= 18.000; Z= -1.912 ; Sig.= .059 -1.36 (30.70) 28.00 (21.36) U= 14.500; Z= -2.246 ; Sig.= .020 6.17 (46.85) NSs’ and NNSs’ productions were also found to differ with respect to the duration of the vowel preceding the oral stop (see Table 2). NSs produced shorter vowels before voiceless stops and fricatives and longer vowels before voiced stops and fricatives than NNSs (see Figure 6). This pattern appears to be systematic and without exception, suggesting that the difference in the duration of the preceding vowel is always greater for NSs than for NNSs and _____________________ New Sounds 2007: Proceedings of the Fifth International Symposium on the Acquisition of Second Language Speech - 214 - therefore NSs can make use of this difference in length as a cue to voicing. A series of MannWhitney U tests revealed that the difference in vowel duration before voiced and voiceless fricatives and oral stops is always significantly greater for NSs than for NNSs (see Table 4). NNSs, the same as NSs, appear to make use of vowel duration as a cue to voicing and do also consistently produce significant differences in vowel length according to the voicing specification of the following obstruent. The size of the difference in length, however, is significantly smaller in NNSs’ production (see Figure 7). Table 4. Vowel duration differences before voiceless and voiced consonants according to speaker group ms. ice-eyes pace-pays rope-robe tripe-tribe mate-made sight-side NNSs NSs Mann-Whitney U test 70.76 (58.64) 214.25(34.75) 84.69 (54.99) 193.66 (48.52) U= 8.000; Z= -2.544 ; Sig.= .011 61.80 (55.18) 138.50 (42.06) U= 10.000; Z= -2.530 ; Sig.= .011 59.68 (50.82) 134.00 (40.73) U= 8.000; Z= -2.492 ; Sig.= .013 57.55 (38.33) 129.75 (20.64) U= 8.000; Z= -3.014 ; Sig.= .003 54.47 (40.45) 130.50 (27.08) U= 8.000; Z= -2.963 ; Sig.= .003 NNSs U= 3.000; Z= -3.189; Sig.= .001 NSs 250 200 150 100 50 0 Vowel duration e e es ys de be id ib y a o -e ma -r -p -tr t-s e e e e h e c c p p i at sig ro pa t ri m Fricative and stop closure e e es ys de be id ib y a -e ma - ro -p -tr t-s e e e e h e c c p p i at sig ro pa t ri m Figure 7. Duration differences between voiceless vs. voiced contexts Acoustic analyses of the word-final obstruents produced by NNSs and NSs revealed differences in the presence of vocal cord vibration during the production of fricatives and the closure phase of stops as well as in whether the oral stop closures were unreleased or audibly released (with or without friction noise following the release burst). Native speakers behaved very consistently in that all underlyingly voiceless consonants were produced without vocal cord vibration and all underlyingly voiced consonants were at least partially voiced. Devoicing is a common feature of English word-final obstruents (e.g., Cruttenden 1994, p. 141), but it does not occur in the NSs’ productions analyzed here, probably due to the fact that the NS informants were instructed to read the carrier phrases to be used in the imitation task “carefully” at “normal speed” and the target word-final consonants occurred always in absolute sentence position at the end of the carrier phrase. As regards the release phase of oral stops, NSs produced all oral stops with audible release bursts and in voiceless stops the release burst was typically followed by friction produced by the sudden release of the air pressure at the opening of the oral closure. As expected, NNSs were much less consistent in the use of voicing and in fact the percentage of underlyingly voiced consonants that were fully devoiced was not very different from the percentage of underlyingly voiceless consonants that _____________________ New Sounds 2007: Proceedings of the Fifth International Symposium on the Acquisition of Second Language Speech - 215 - were produced without vocal cord vibration. The fact that NNSs produced some occurrences of voiceless /s/ (28.7%), /p/ (8%) and /t/ (2.1%) with vocal cord vibration suggests that these might have been interpreted as being underlyingly voiced in the imitation task (/s/ in particular), which would be indicative of perceptual problems when attending to the phonetic cues that distinguish English voiced and voiceless obstruents in word-final position. The results of the discrimination task, however, do not seem to confirm this hypothesis, as overall correct discrimination scores were at ceiling and little difference is found across contrasting sounds in mean percent correct discrimination (see Table 1). Underlyingly voiceless Underlyingly voiced 100 90 80 70 60 50 40 30 20 10 0 /s/-/z/ /p/-/b/ /t/-/d/ Figure 8. Percentage of word-final obstruents realized without vocal cord vibration Thus, underlyingly voiced stops are only slightly less frequently realized without vocal cord vibration than underlyingly voiceless consonants (see Figure 8) and the /s/-/z/ contrast appears to be virtually non-existent with respect to voicing. A similar pattern is obtained when the percentage of audible release bursts followed by friction noise is calculated for voiced and voiceless consonants, which is only slightly higher for voiceless than for voiced consonants, as opposed to the pattern observed in the NSs’ productions, which present a huge contrast between voiced (0%) and voiceless stops (81.25% on average). The segmental duration differences observed between NNSs’ and NSs’ productions suggest that NNSs succeed in making use of the duration differences in vowel length resulting from the voicing specification of the following consonant as a cue to voicing, but do not do so to the extent that NSs do. The significantly much shorter vowel duration measures obtained in NNSs’ productions are not compensated for by producing greater differences than NSs in fricative and stop closure duration between voiced and voiceless consonants, or by consistently making use of closure voicing in voiced consonants. This confirms that the wordfinal voicing contrasts examined are less robustly realized by NNSs than by NSs. Therefore, we would predict a positive effect of greater experience in the L2 and an earlier onset of L2 learning in terms of a more robust native-like realization of the voicing contrasts in word-final position through a significant increase in segmental length differences, less devoicing in voiced consonants, absence of closure voicing in voiceless consonants and greater frequency of voiceless stops released with audible friction. In order to explore the effect of onset age of L2 learning and experience on segmental duration in the realization of the voicing contrast in word final obstruents, duration measures were submitted to a series of independent-sample tests (Mann-Whitney U and KruskalWallis) with the following subject-grouping variables as factors: age of onset of L2 learning (AOL), extra exposure to English (EE), stays abroad (SA), use of English during the SA (SAU) and gender (G). In general, the analyses failed to reveal significant differences _____________________ New Sounds 2007: Proceedings of the Fifth International Symposium on the Acquisition of Second Language Speech - 216 - between the groups defined by the factors under examination, but some statistically significant differences in segmental duration were found for specific sound contrasts and some general tendencies could be observed (see Table 5). Table 5. Segmental duration (ms) according to EE (SD in parentheses) Vowel Duration NO extra Extra exposure exposure ms /s//z/ /p//b/ /t//d/ ice eyes pace pays rope robe tripe tribe mate made sight side 262.36 (52.18) 253.68 (49.62) 354.79 (55.42) 314.78 (49.19) 214.64 (51.87) 210.23 (38.32) 307.31 (74.47) 289.91 (59.08) 187.00 (36.77) 181.00 (31.05) 238.00 (38.70) 250.58 (61.21) 172.07 (43.38) 171.89 (24.58) 247.86 (58.09) 229.77 (48.44) 230.93 (40.30) 206.22 (33.21) 291.21 (58.17) 264.45 (50.81) 232.43 (47.86) 216.87 (39.08) 303.00 (61.23) 265.56 (47.53) Segmental Duration V. Duration Difference NO extra Extra exposure exposure 92.42 (38.23) 91.53 (72.89) 51.10 (32.27) 75.78 (61.78) 60.28 (36.98) 63.84 (34.11) Consonant Duration NO extra Extra exposure exposure 192.43 (47.22) 61.67 (65.77) 81.64 (48.78) 75.57 (63.58) 53.56 (41.53) 57.32 (38.88) 50.25 (44.49) 196.48 (54.25) 188.71 (57.08) 181.31 (47.19) 218.29 (68.65) 209.23 (56.74) 190.08 (68.80) 161.59 (49.52) 89.88 (19.90) 77.25 (25.21) 83.20 (12.89) 86.89 (25.39) 102.25 (15.96) 91.12 (20.47) 101.00 (32.55) 82.62 (23.71) 89.78 (23.33) 84.35 (50.11) 91.88 (26.72) 72.61 (31.83) 95.27 (28.41) 86.23 (35.05) 112.38 (19.20) 77.06 (22.97) For example, no significant segmental duration differences were found between subject groups according to AOL and no consistent pattern could be observed for vowel duration. However, with the exception of /z/, earlier starters (AOL before 8) were found to produce slightly shorter voiced consonants and slightly longer voiceless consonants than later starters (AOL after 8). This native-like trend, however, is not fully confirmed by the duration measures obtained for vowels, as it is not always the case that vowel duration differences according to the voicing value of the following consonant are greater for earlier starters: this is the case for ice-eyes, rope-robe (p=.047; Z=-1.985), and tripe-tribe, but not for the rest of word pairs (see Figure 9). AOL <8 AOL >8 250 200 150 100 50 0 e sid ad e m e tri b ro be pa ys ey es t sig h at e Voiced m e tri p ro pe pa ce ic e Voiceless Figure 9. Consonant duration (ms) _____________________ New Sounds 2007: Proceedings of the Fifth International Symposium on the Acquisition of Second Language Speech - 217 - NO EE EE 450 400 350 300 250 200 150 100 50 0 e sid m ad e e tri b ro be pa ys s ey e ht sig at e Voiced m e tri p ro pe pa ce ic e Voiceless Figure 10. Vowel duration (ms) as a function of extra exposure Vowel duration (except for robe) was found to be slightly shorter for those participants who had received extra exposure to English outside school than for those who had not. This vowel duration difference, however, cannot be interpreted as implying a gain in production accuracy because (a) it affected voiced and voiceless consonants similarly, reaching significance in the case of eyes (/z/: p=.045; Z=-0.242) and approaching significance in mate and side (/t/: p=.077; Z=-1.767; /d/: p=.067; Z=-1.834) and (b) the difference in vowel duration between voiced and voiceless contexts (except for the word pair rope-robe) tended to be smaller for the group that received extra exposure , reaching significance for the ice-eyes word pair (p=.05; Z=-1.962) (see Figure 10). All of this suggests, contrary to our predictions, that extra-exposure group would be less proficient than the no-extra-exposure group in making use of vowel duration differences to signal voicing in word-final obstruents. The analysis of consonant duration data shows a similar pattern of results, with the extra-exposure group in general producing slightly shorter voiced and voiceless consonants than the no-extra-exposure group (except for made). It would seem that extra exposure has the effect of slightly reducing segmental duration in general, both in vowels and consonants and irrespective of the voicing specification of the word-final obstruents. When segmental duration was analyzed according to extra exposure to English obtained through stay-abroad periods (never, ≤3 months without FI, ≤3 months with FI, >3 months with FI) no significant differences were found. In this case, no consistent pattern or general tendency could be observed (see Table 6). However, as observed for those subjects who had received extra exposure to English outside school, there is a slight tendency for subjects reporting greater amount of L2 use abroad (75% vs. 50% or 25%) to produce both vowels and consonants with shorter durations irrespective of their voicing value, except for robe and the closure duration of alveolar stops, which is longer for those subjects reporting 75% of everyday use of English while abroad. _____________________ New Sounds 2007: Proceedings of the Fifth International Symposium on the Acquisition of Second Language Speech - 218 - Table 6. Segmental duration (ms) as a function of SAU (SD in parentheses) ms Vowel Duration 50 ≤ 25 ≥ 75 % ice /s/ /z/ eyes pace pays rope /p/ /b/ robe tripe tribe mate /t/ /d/ made sight side 272.36 (56.14) 329.00 (50.28) 216.55 (47.28) 316.73 (68.81) 193.33 (45.15) 242.00 (36.59) 175.73 (29.51) 230.64 (55.08) 218.64 (40.48) 272.18 (50.13) 219.18 (44.46) 274.09 (43.48) 266.09 (55.89) 324.55 (51.54) 206.00 (50.84) 279.91 (62.96) 189.88 (29.48) 234.83 (46.01) 179.11 (26.17) 258.50 (70.86) 212.00 (42.06) 272.91 (54.52) 232.73 (51.42) 265.00 (62.66) 232.45 (43.24) 316.36 (57.26) 200.55 (35.26) 290.45 (57.40) 169.86 (17.29) 256.88 (81.96) 149.44 (31.53) 224.80 (36.45) 208.18 (23.83) 264.00 (60.99) 210.20 (22.83) 269.57 (38.17) Segmental Duration V. Duration Difference 50 ≤ 25 ≥ 75 56.63 (49.74) 58.45 (64.16) 83.90 (66.85) 100.18 (52.82) 79.00 (64.13) 89.90 (48.28) 47.75 (32.99) 46.66 (25.25) 105.66 (83.50) 54.90 (34.01) 87.87 (80.99) 68.50 (43.91) 53.54 (37.21) 60.90 (43.92) 52.80 (44.15) 54.90 (23.06) 32.60 (48.55) 65.57 (39.02) Consonant Duration 50 ≤ 25 ≥ 75 201.30 (35.91) 200.82 (52.62) 217.73 (49.86) 161.18 (39.08) 90.75 (27.03) 82.00 (18.08) 102.75 (18.00) 99.00 (27.64) 77.00 (29.28) 77.83 (26.28) 90.90 (39.64) 83.75 (29.64) 201.64 (62.42) 177.00 (44.80) 207.20 (77.78) 180.82 (72.41) 78.20 (19.98) 83.00 (2.82) 93.00 (22.71) 83.33 (36.29) 94.80 (97.24) 56.50 (8.50) 68.29 (25.81) 74.20 (21.05) 173.80 (53.27) 166.45 (57.22) 194.27 (56.31) 153.09 (52.70) 61.33 (13.05) 93.60 (29.89) 97.25 (18.24) 97.60 (15.91) 103.75 (25.59) 98.50 (28.24) 105.56 (37.02) 104.33 (27.38) As far as gender is concerned, female speakers tended to produce slightly longer vowels in all contexts (except for ice and robe), but none of these differences turned out to be significant, except for the /z/ in pays, which almost reached significance (p=.051; Z=-1.952). Female speakers also produced slightly greater (but non-significant) differences according to context (voiced vs. voiceless consonant) in the alveolar fricative and oral stop sound contrasts. Except for labials and the /d/ in made, female participants also produced longer consonants than males, irrespective of voicing. The effect of age of onset of L2 learning and experience on the realization of the voicing contrast in word final obstruents with respect to the presence of vocal cord vibration (during friction and the closure phase of oral stops) and the presence of a release burst followed by audible friction, was explored through Pearson χ2 and Fisher’s Exact Probability tests with the same subject-grouping variables as factors that were used to examine segmental duration – i.e., AOL, EE, SA, SAU, and G. In general, the analyses failed to reveal significant differences between the groups defined by these factors, suggesting that neither AOL nor experience had any effect on subjects' ability to accurately produce the English voicing contrast in word-final obstruents. 4. Discussion and conclusions The present study examined the perception and production of voicing contrasts in English word-final obstruents – that is, /s/-/z/, /p/-/b/, and /t/-/d/ – by 48 Catalan/Spanish learners of English varying in age of onset of FL learning and experience in the TL in a formal learning context. Concerning the AXB discrimination task, neither starting age – before and after age 8 – nor experience – extra exposure and/or stays abroad – had a significant effect on the correct discrimination scores for the three consonant contrasts. What is more, learners discerned the three contrasts at similar high correct rates, which is at odds with findings of an analogous _____________________ New Sounds 2007: Proceedings of the Fifth International Symposium on the Acquisition of Second Language Speech - 219 - population in Mora (2007), where Catalan/Spanish advanced learners of English perceived /t//d/ more accurately than /s/-/z/. When taken together with findings in Mora and Fullana (2007), the learners in this study also differed from those in Mora (2007) in that the 48 Catalan/Spanish learners of English discriminated English consonant contrasts better than vowel contrasts (M = 92.67 and 86.85, respectively). Despite these differences and the lack of significant effects of the research variables, the observed tendency of a late starting age advantage in the perception of voicing contrasts is in line with findings of formal learning contexts (e.g., Gallardo & García-Lecumberri, 2006). Moreover, the somewhat better performance on the part of learners without extra curricular exposure or with longer stay-abroad periods corroborates the mixed exposure effects so far reported for both immersion and formal learning settings (e.g., Cebrian, 2006; Flege et al., 1992; Mora, 2007). Additional non-significant starting age and exposure effects were obtained for learners’ production of English /s/-/z/, /p/-/b/, and /t/-/d/. However, unlike perception results, earlier starters, rather than late starters, resembled English NSs more closely by producing slightly shorter voiced and longer voiceless consonants in word-final position, in addition to greater vowel duration differences in some word-pairs; hence agreeing with the expected earlier starting age advantage in naturalistic settings. However, an increase in exposure by means of extra curricular language courses or stay-abroad periods did not result in learners’ consistent use of vowel duration differences as a cue to voicing in word-final obstruents. Thus, the latter finding further corroborates the inconclusive experience effects noted for Catalan/Spanish speakers’ discernment of English voicing contrasts. Besides, findings of learners’ nonnative-like production of inexistent voicing contrasts in their L1 did not conform to the predictions of the SLM (Flege, 1995). Furthermore, Catalan/Spanish speakers’ realization of underlyingly voiced stops without vocal cord vibration at rates similar to those of underlyingly voiceless stops, and absence of voicing in voiced fricatives, agrees with production difficulties observed for Romance language speakers (Cebrian, 2000; Flege et al., 1995b). In sum, in the current FL learning setting, age of onset of FL learning and exposure to the TL failed to determine Catalan/Spanish bilinguals’ perception and production of voicing contrasts of English word-final /s/-/z/, /p/-/b/, and /t/-/d/. The findings of the present study further point to the need to address input quality – particularly, specific formal instruction that deals with the perception and production of English sounds – when assessing starting age and experience effects. Acknowledgements This study was funded by Grant HUM2004-05167/FILO from the Ministerio de Ciencia y Tecnología in Spain. Part of this work was also made possible thanks to a postdoctoral fellowship to the first author by the Secretaría de Estado de Universidades e Investigación del Ministerio de Educación y Ciencia (Spain). The authors would like to thank the Laboratori de Fonètica and Laboratori de Fonètica Aplicada at the Universitat de Barcelona, Anna Maria Agustí, Cristina Aliaga and Eva Cerviño for help with stimulus preparation and data collection, as well as all the participants in the study. References Boersma, P., & Weenink, D. (2007). Praat: Doing phonetics by computer (Version 4.6.05) [Computer program]. Retrieved June 2, 2007, from http://www.praat.org/ _____________________ New Sounds 2007: Proceedings of the Fifth International Symposium on the Acquisition of Second Language Speech - 220 - Cebrian, J. (2000). Transferability and productivity of L1 rules in Catalan-English interlanguage. Studies in Second Language Acquisition, 22, 1-26. Cebrian, J. (2006). Experience and the use of non-native duration in L2 vowel categorization. Journal of Phonetics, 34, 372-387. Flege, J. E. (1988.) The development of skill in producing word-final English stops: Kinematic parameters. Journal of the Acoustical Society of America, 84, 1639-1652. Flege, J. E. (1989). Chinese subjects’ perception of the word-final /t/-/d/ contrast: Performance before and after training. Journal of the Acoustical Society of America, 86, 1684-1697. Flege, J. E. (1993). Production and perception of a novel, second-language phonetic contrast. Journal of the Acoustical Society of America, 93, 1589-1589. Flege, J. E., & Davidian, R. D. (1984). Transfer and developmental processes in adult foreign language speech production. Applied Psycholinguistics, 5, 323-347. Flege, J. E., & Eefting, W. (1987). Production and perception of English stops by native Spanish speakers. Journal of Phonetics, 15, 67-83. Flege, J. E., & Hillenbrand, J. (1986). Differential use of temporal cues to the /s/-/z/ contrast by native and non-native speakers of English. Journal of the Acoustical Society of America, 79, 508-517. Flege, J. E., & Port, R. (1981). Cross-language phonetic interference: Arabic to English. Language and Speech, 24, 125-146. Flege, J. E., & Wang, C. (1989). Native language phonotactic constraints affect how well Chinese subjects perceive the word-final English /t/-/d/ contrast. Journal of Phonetics, 17, 299-315. Flege, J. E., McCutcheon, M. J., & Smith, S. C. (1987). The development of skill in producing wordfinal English stops. Journal of the Acoustical Society of America, 82, 433-447. Flege, J. E., Munro, M. J., & MacKay, I. R. A. (1995a). Factors affecting strength of perceived foreign accent in a second language. Journal of the Acoustical Society of America, 97, 3125-3134. Flege, J. E., Munro, M. J., & MacKay, I. R. A. (1995b). Effects of age of second-language learning on the production of English consonants. Speech Communication, 16, 1-26. Flege, J. E., Munro, M. J., & Skelton, L. (1992). Production of the word-final English /t/-/d/ contrast by native speakers of English, Mandarin and Spanish. Journal of the Acoustical Society of America, 92, 128-143. Gallardo, F., & García-Lecumberri, M. L. (2006). Age effects on single phoneme perception for learners of English as a foreign language. In C. Abello-Contesse, R. Chacón-Beltrán, M. D. López-Jiménez, & M. M. Torreblanca-López (Eds.), Age in L2 acquisition and teaching (pp. 115131). Bern, Switzerland: Peter Lang. MacKay, I., Meador, D., & Flege, J. E. (2001). The identification of English consonants by native speakers of Italian. Phonetica, 58, 103-125. McAllister, R. (2007). Strategies for realization of L2-categories. English /s/-/z/. In O.-S. Bohn, & M. J. Munro (Eds.), Language experience in second language speech learning. In honor of James Emil Flege (pp. 153-166). Amsterdam: John Benjamins. Mora, J. C. (2007). Learning context effects on the acquisition of a second language phonology. In C. Pérez-Vidal, M. Juan-Garau, & A. Bel (Eds.), A portrait of the young in the new multilingual Spain (pp. 241-263). Clevedon: Multilingual Matters. Mora, J. C., & Fullana, N. (2007). Production and perception of English /i/-// and /æ/-/!/ in a formal setting: Investigating the effects of experience and starting age. Proceedings of the 16th International Congress of Phonetic Sciences (pp. 1613-1616). Saarbrücken, Germany. Wang, Y., & Behne, D. (2007). Temporal remnants from Mandarin in nonnative English speech. In OS Bohn, & M. J. Munro (Eds.), Language experience in second language speech learning. In honor of James Emil Flege (pp. 153-166). Amsterdam: John Benjamins. Yava, M. (1994). Final stop devoicing in interlanguage. In M. Yava (Ed.), First and second language phonology (pp. 267-282). San Diego, CA: Singular. _____________________ New Sounds 2007: Proceedings of the Fifth International Symposium on the Acquisition of Second Language Speech - 221 -
© Copyright 2026 Paperzz