Production and Perception of Voicing Contrasts in English Word

Production and Perception of Voicing Contrasts in English Word-Final
Obstruents: Assessing the Effects of Experience and Starting Age
Natalia Fullana1, Joan C. Mora2
1
2
University of Ottawa, Canada
Universitat de Barcelona, Spain
[email protected], [email protected]
1. Introduction
Research into the acquisition of the English sound system as a second language (L2) and/or
foreign language (FL) has shown that Romance language native speakers have difficulty in
perceiving and producing English consonant voicing in word-final position. Such difficulty
has frequently been ascribed either to the lack of word-final consonants in the learners’ first
language (L1) (e.g., Spanish and Italian) or to the non-occurrence of voiced consonants –
particularly stops – in absolute word-final position (e.g., Catalan). Consequently, Romance
language speakers often resort to L1-based production strategies. Thus, they have been found
to devoice voiced obstruents in word-final position (Cebrian, 2000; Flege & Davidian, 1984;
Flege, Munro, & Skelton, 1992; see also Yava, 1994), and to a lesser degree, to spirantize
and delete those sounds (Flege & Davidian, 1984).
Failure to perceive and produce the consonant voicing contrast in a native-like manner
on the part of Romance language learners of English has further been associated with age of
onset of L2 learning and experience in the target language (TL). For instance, Flege, Munro,
and MacKay (1995b) examined the production of English /p/-/b/, /t/-/d/, and /k/-/g/ contrasts
by Italian L2 learners with an average length of residence (LOR) in Canada of 32 years. It
was observed that native Italian speakers with a starting age of L2 learning (AOL) of 3 to 13
years produced English final /t/-/d/ tokens accurately, whereas those participants with an AOL
of 15 to 21 years did not succeed in producing the /t/-/d/ contrast at native-like rates.
Moreover, late L2 learners’ performance on English /p/-/b/ and /k/-/g/ was significantly
different from that of English native speakers (NSs) (learners’ AOLs of 19 to 21 years and 17
to 21 years, respectively). It should also be mentioned that the authors commented on the fact
that several individual late L2 learners produced the voicing contrast between /p/-/b/ and /k//g/ within the English native speaker range.
In spite of the commonly reported late L2 learners’ nonnative-like perception and
production of English consonant voicing contrasts, several studies have indicated that late
Romance language learners of English might enhance their perceptual and production skills as
a result of greater experience in the TL. Thus, adult Spanish L2 learners with 9 years of
experience (or exposure) in English in an immersion setting approximated the English-like
production of /t/-/d/ in word-final CVC context, as opposed to Spanish late L2 learners with a
significantly lower amount of experience in English – 0.4 years (Flege et al., 1992). In the
case of FL learning settings, Catalan/Spanish bilingual advanced learners of English discerned
voicing contrasts – i.e., /t/-/d/, /s/-/z/, and /t/-/d/ – at significantly higher correct rates after
an 80-hour period of formal instruction at their home university, whereas additional exposure
by means of a 3-month stay abroad period in English-speaking countries did not contribute to
a significantly consistent better discrimination rate of English voicing contrasts (Mora, 2007).
A rather precise account of these findings concerning differential experience outcomes
and, above all, starting age effects on Romance language NSs’ perception and production of
English voicing contrasts in word-final position lies in Flege’s (1995) Speech Learning Model
(SLM). Based on the assumption that “the phonetic systems used in the production and
perception of vowels and consonants remain adaptive over the life span, and that phonetic
systems reorganize in response to sounds encountered in an L2 through the addition of new
phonetic categories, or through the modification of old ones” (Flege, 1995, p. 233), the SLM
makes predictions about the perception and production of L2 vowels and consonants.
As for L2 consonant production and perception, the earlier learners start acquiring the
TL, the more likely they will be to detect phonetic differences between L1 and L2 consonant
segments, which then might lead to the establishment of additional phonetic categories for L2
consonants. However, in those instances where the phonetic distance between an L1
consonant and an L2 consonant is perceived to be smaller – by means of equivalence
classification (Flege, 1987) – L2 learners appear to produce the TL sounds with intermediate
values between the typical values of L1 sounds and those of L2 sounds (e.g., VOT in Flege,
1987; Flege & Eefting, 1987).
Similarly, the SLM also hypothesizes that learners who lack L2 consonant sounds in
certain allophonic positions in their L1 sound inventory will also be able to establish new
phonetic categories for those sounds. Therefore, it is predicted that learners will perceive and
produce L2 sounds accurately. Yet various studies have revealed that, in spite of lacking
voiced stops in word-final position, native Spanish and Italian late learners, among others,
failed to produce English voiced stops in word-final position (e.g., Flege et al., 1992, for
Spanish; Flege et al., 1995, for Italian). Moreover, and unlike predictions of the SLM about
improved discernment and production of L2 sounds as a function of experience, the various
amounts of experience of Spanish NSs and long-term exposure of native Italian participants
did not lead to a more accurate production of the distinctive feature of voicing.
Taking all of the above into account, the present study aimed to further investigate the
perception and production of voicing contrasts in English word-final obstruents /s/-/z/, /p/-/b/,
and /t/-/d/ by Romance language speakers, namely Catalan/Spanish advanced learners of
English in a formal learning setting. In addition, this study examined the effects of starting
age of FL learning (before vs. after 8 years) and experience in English (amount of formal
instruction and extra curricular exposure at home and/or overseas) on the perception and
production of English voicing contrasts.
2. Method
2.1 Participants
The participants in the present study were 48 non-native speakers of English (NNSs) learning
English as a foreign language in a formal instruction context. They were bilingual speakers of
Catalan and Spanish studying at university in Barcelona (33 female, 15 male) with a mean age
of 21.2 years and an advanced level of competence in English. All of them had learnt English
mainly through formal classroom instruction in Catalonia, but differed in starting age of FL
learning – i.e., onset age at or before 8 vs. onset age after 8 – and in amount of experience in
the L2 – i.e., school exposure only vs. extra exposure throughout schooling and/or through
short (< 3 months) stay-abroad periods (with or without a formal instruction component). A
group of 4 native speakers of Standard Southern British English (SSBE) (1 female, 3 male)
produced the stimuli for the construction of the perception and production tasks in the present
study and provided base-line data for the production task.
2.2 Materials
Learners’ accuracy in vowel and consonant perception and production was assessed,
respectively, through a categorial AXB discrimination task and a sentence imitation task using
_____________________
New Sounds 2007: Proceedings of the Fifth International Symposium on the Acquisition of Second Language Speech
- 208 -
a delayed repetition technique (Flege, Munro, & MacKay, 1995a). The present paper reports
on consonant data only (see Mora & Fullana, 2007, for results concerning the vowel data in
the study) and, in particular, it examines learners’ ability to perceive and produce voicing
contrasts in English word-final obstruents, specifically /s/-/z/, /p/-/b/, and /t/-/d/.
The consonant discrimination task was a forced-choice AXB test consisting of 72
triads (24 triads per contrast) with 0.5 inter-stimulus intervals (ISI) and 1.5 inter-trial intervals
(ITI) distributed into three blocks. Each block tested participants on a single consonant
contrast and contained 24 fully randomized triads per contrast: 4 orders (AAB, ABB, BAA,
BBA) x 2 minimal pairs per contrast (/s/-/z/: pace-pays, ice-eyes; /p/-/b/: rope-robe, tripetribe; and /t/-/d/: mate-made, sight-side) x 3 repetitions each by a different speaker (1 female,
2 male). Diphthongs were chosen as the pre-consonantal context in order to maximize the
effect of consonant voicing on vowel duration. The delayed sentence repetition (DSR) task
consisted of a set of 12 mini-dialogues eliciting the 6 target consonant sounds in the same
words used in the perception task. The 12 elicited words (pace, pays, ice, eyes, rope, robe,
tripe, tribe, mate, made, sight, side) were embedded in the carrier phrase “I said ___” that the
participants imitated from a SSBE native speaker model (e.g., – What did you say? – I said
ice).
The three English voicing consonant contrasts in word-final position were chosen
because the voicing contrast is lacking in this position at the phonetic level both in Spanish
and Catalan. Thus, whereas in English word-final obstruents may range from fully voiced to
completely devoiced and the preceding vowel is significantly shorter before voiceless than
voiced consonants, Catalan has a rule of terminal devoicing whereby underlyingly voiced
obstruents are realized as voiceless in word-final position (freda “cold” (fem) /frd
/-[fr
];
cf. fred “cold” (masc) [frt]). In other word positions, however, the pairs /s/-/z/, /p/-/b/ and /t//d/ are phonologically contrastive (caça “hunting” /kas
/ vs. casa “house” /kaz
/; pes
“weight” /ps/ vs. bes “kiss” /bs/; teu “yours” /tew/ vs. deu “god” /dew/). In Spanish, /d/
(phonetically dentialveolar [d]) is the only voiced oral stop which may occur in word-final
position (ciudad “city” /iwdad/), but it is typically spirantized and realized as []
([iwa]; or as [] or [], depending on dialectal variation). Spanish lacks voiced fricative
phonemes. A voiced alveolar fricative [z] is found as an allophone of /s/ before voiced
consonants (e.g., mismo [mizmo] “same”, desde [deze] “from”) but never in word-final
position. Consequently, Catalan/Spanish learners’ productions of English word-final /z/, /b/
and /d/ were expected to differ from that of English native speakers in two fundamental ways:
(a) frication noise (for /z/) and stop closure (for /b/ and /d/) often lacking vocal cord vibration,
and (b) relatively small effect of stop voicing on the duration of preceding vowels. English
native speakers have been shown to rely mainly on vowel duration as a perceptual cue to the
voicing feature in word-final stops and fricatives, together with closure/frication duration,
voicing during closure/frication and F1 offset frequency as secondary cues. Because the effect
of consonant voicing on the duration of preceding vowels is larger in English than in
Catalan/Spanish, and /z/, /b/ and /d/ do not occur word-finally in Catalan/Spanish, it was
expected that learners would show some perceptual difficulty in discriminating the word-final
voicing contrasts /s/-/z/, /p/-/b/ and /t/-/d/.
2.3 Procedures
In the AXB discrimination task, participants were asked to identify the two words in the triad
with the same consonant by ticking the appropriate option in an answer sheet (A for AAB or
BBA triads; B for ABB or BAA triads). A set of AXB triads with minimal pairs not included
in the experimental task itself, but with the same ISI and ITI, was provided for practice. The
DSR task was administered individually in a sound-proof booth. The participants heard the
mini-dialogues over headphones and imitated the native speaker model’s phrase “I said __”.
_____________________
New Sounds 2007: Proceedings of the Fifth International Symposium on the Acquisition of Second Language Speech
- 209 -
Participants’ productions were digitally recorded (at 22.05kHz, 16-bit) and computer-edited
for subsequent acoustic analysis with Praat (Boersma & Weenink, 2007). A practice set was
also provided at the beginning of the DSR task.
2.4 Analyses
Perceptual accuracy was assessed through an analysis of participants’ mean percent correct
discrimination scores on the AXB task. Accuracy in production was investigated through
acoustic analyses of participants’ productions of the target word-final consonants in the DSR
task, which included measures of vowel duration, consonant duration, presence/absence of
voicing during friction (for /s/ and /z/) and stop closure (for /p/, /b/, /t/ and /d/), and
presence/absence of a release burst (for the stop consonants). Duration measures were
obtained from sound waveforms and spectrograms generated by Praat. Vowel length was
obtained by placing a beginning cursor in the middle of the first peak of periodic energy
indicating the onset of the vowel and an end cursor at the last peak of periodic energy that
coincides with a drastic drop in energy indicating the beginning of the stop closure (or at the
last peak of periodic energy just before the beginning of the high-frequency noise
characteristic of /s/ and /z/). Consonant duration was measured from the beginning of the stop
closure (indicated by lack of energy on spectrograms) up to the beginning of the release burst,
thus measuring the duration of the closing phase of the stop rather than the duration of the
whole stop (which would also include the duration of the release burst and any subsequent
frication noise produced by aspiration). As the word-final consonants were elicited in a
sentence-final position context, reliable consonant duration measures could only be obtained
for fricatives (100%) and for those stops that were produced with a detectable release burst.
The proportion of sentence final stops produced with audible release varied according to place
of articulation and voicing: 49.2% of /b/s, 58.5% of /d/s (i.e., 53.85% of voiced stops on
average); 52.25% of /p/s and 73.25% of /t/s (i.e., 62.75% of voiceless stops on average).
Wilcoxon Signed Ranks tests were performed to explore differences in vowel and
consonant duration among minimal pairs within a given contrast. The effect of experience and
onset age of learning on learners’ ability to accurately perceive and produce the /s/-/z/, /p/-/b/,
and /t/-/d/ consonant contrasts was explored through Kruskal-Wallis and Mann-Whitney U
tests.
Experience was further operationalized by means of two variables: extra exposure,
which groups together participants who had not been exposed to English outside school
versus those who had received extra exposure to English (mainly through regular attendance
to English courses offered by private language schools); and stays abroad, which divides the
pool of participants in the study into four groups according to whether they had benefited
from short-term stay abroad periods (never, ≤3 months, >3 months), with or without
simultaneous formal instruction (FI) abroad. Given the FL learning context of the
Catalan/Spanish learners of English in the present study, starting age divides the experimental
group into two age groups: onset age of FL learning at or before 8 years of age (range: 3–8,
M=6.88, SD= 1.36) and after 8 (range: 9–12, M=10.84, SD= 1.01).
3. Results
3.1 AXB discrimination task
The overall percent correct discrimination scores for /s/-/z/, /p/-/b/, and /t/-/d/ contrasts
obtained on the AXB discrimination task were fairly similar among learners (see Table 1 and
_____________________
New Sounds 2007: Proceedings of the Fifth International Symposium on the Acquisition of Second Language Speech
- 210 -
Figure 1 below). A Friedman test performed on those scores showed that the three consonant
contrasts were indeed discerned at similar high correct rates (χ2 1.264, p = .532).
Table 1. Percent correct discrimination scores on the AXB task
Contrast
/s/-/z/
/p/-/b/
/t/-/d/
NNSs
M
93.40
92.44
92.18
SD
6.53
7.86
8.27
Even so, male participants were found to perceive the three consonant contrasts at
higher correct rates than female participants (see Figure 1), resulting in significant differences
in /s/-/z/ and /t/-/d/ (U 123, Z –2.855, p < .05; and U 162.5, Z –1.949, p < .05, respectively)
and in near-significant differences in /p/-/b/ (U 168, Z –1.813, p = .070).
/s/-/z/
100
/p/-/b/
/t/-/d/
98
96
94
92
90
All
Female
Male
Figure 1. Percent discrimination scores for all participants and as a function of gender
As for the factors under study, the Mann-Whitney U tests and Kruskal-Wallis analyses
carried out failed to reveal any significant differences in discrimination scores according to
age of onset of FL learning, amount of exposure to English throughout schooling, and stays
abroad. At the most, differences in scores approached significance in the discernment of /p//b/ in favour of those participants with no extra exposure to English throughout schooling vs.
those participants with some degree of extra exposure (95.23 vs. 91.14; U 150.5, Z -1.801, p =
.072).
Despite the lack of significant differences, the following trends were observed. Firstly,
late starters tended to better discern the three voicing contrasts, in particular, /s/-/z/ and /p/-/b/
(see Figure 2). Moreover, those participants with school exposure only, as well as those
without any stay-abroad period, perceived those contrasts at higher correct rates (Figure 3).
Finally, those participants with a stay-abroad period longer than 3 months and with language
course(s) also discriminated the voicing contrasts more accurately (Figure 4).
_____________________
New Sounds 2007: Proceedings of the Fifth International Symposium on the Acquisition of Second Language Speech
- 211 -
/s/-/z/
100
/p/-/b/
/t/-/d/
98
96
94
92
90
≤ age 8
> age 8
Figure 2. Percent discrimination scores as a function of age of onset of FL learning
100
/s/-/z/
/p/-/b/
/t/-/d/
98
96
94
92
90
no extra exposure
extra exposure
Figure 3. Percent discrimination scores as a function of exposure to English
/s/-/z/
100
99
98
97
96
95
94
93
92
91
90
89
88
87
never
/p/-/b/
≤ 3 months
w/out course
/t/-/d/
≤ 3 months
w/course
> 3 months
w/course
Figure 4. Percent correct discrimination scores as a function of stay-abroad periods
_____________________
New Sounds 2007: Proceedings of the Fifth International Symposium on the Acquisition of Second Language Speech
- 212 -
3.2 Delayed sentence repetition task
The statistical tests performed on the acoustic data obtained in the production task failed to
yield significant differences as a function of AOL and experience, suggesting that neither age
of onset of FL learning nor AH an/or SA extra curricular exposure to English had a significant
effect on learners’ ability to accurately produce the English word-final voicing contrasts
examined in the present study, as measured through vowel and consonant duration and the
presence/absence of glottal pulsing and a release burst in the realization of the contrasting
stops.
As expected, the acoustic measurements revealed that NNSs’ realization of the voicing
contrasts examined differed from that of NSs’ in segmental length (see Table 2). NSs
produced a longer voiceless fricative (/s/) and a shorter voiced fricative (/z/) than NNSs. A
similar, less consistent, pattern was found in the duration of the closure phase of labial stops,
which was longer in /p/ in rope and shorter in /b/ in robe for NSs. The duration of the stop
closure, however, was always shorter for NSs than for NNSs in the labial stops in tripe and
tribe and the alveolar stops /t/ and /d/. NSs, however, are very consistent in producing a large
systematic difference in stop closure/friction duration between the voiced and voiceless
member of the contrasting pairs of consonants. This difference in duration is much smaller
and less consistent in NNSs productions (see Figure 5). In fact, the duration differences
between voiced and voiceless Cs is very small (0.61ms for /p/-/b/ and 3.86ms for /t/-/d/ on
average), except for the /s/-/z/ contrast (24.51ms on average) where a more substantial
difference in duration is observed. These results suggest that NNSs make much less use of
consonantal duration as a phonetic cue to implement the voicing contrast in /s/-/z/, /p/-/b/ and
/t/-/d/. Mann-Whitney U tests revealed significant differences between NSs and NNSs in the
duration difference between voiceless and voiced contexts for fricatives and stop closures (see
Table 3), except for labial stops and the word pair mate-made (here the difference approached
significance: p= .059 (Z= -1.912); see also Figure 3).
Table 2. Segmental length differences between NNSs and NSs (SD in parentheses)
Duration
(ms)
ice
eyes
/s/-/z/
pace
pays
rope
robe
/p/-/b/
tripe
tribe
mate
made
/t/-/d/
sight
side
Preceding Vowel
NNSs
NSs
Fricative/Stop Closure
NNSs
NSs
259.45 (51.11)
208.25 (94.32)
195.24 (50.37)
270,25 (20,17)
329.52 (54.24)
422.50 (79.77)
185.98 (50.27)
153,50 (18,26)
213.91 (43.09)
155.33 (8.32)
211.70 (58.93)
249,50 (33,67)
297.43 (63.31)
338.50 (40.71)
171.91 (56.27)
150,50 (30,11)
184.07 (32.53)
154.75 (21.42)
83.56 (22.89)
89,25 (26,56)
244.94 (53.38)
293.25 (50.10)
85.57 (21.24)
57,75 (5,18)
173.74 (32.25)
235.46 (51.01)
151.00 (35.34)
285.00 (69.93)
92.50 (19.66)
89.67 (27.20)
85,00 (10,44)
57,33 (10,40)
214.75 (36.49)
167.50 (29.80)
85.88 (43.91)
64,25 (24,79)
273.09 (53.32)
297.25 (40.42)
80.64 (32.09)
41,50 (10,47)
225.21 (44.58)
195.75 (60.02)
90.53 (34.11)
71,50 (25,59)
281.38 (55.83)
326.25 (66.62)
88.04 (26.65)
43,50 (5,06)
_____________________
New Sounds 2007: Proceedings of the Fifth International Symposium on the Acquisition of Second Language Speech
- 213 -
NNSs
NSs
300
250
200
150
100
50
0
sid
e
ad
e
m
tri
be
ro
be
pa
ys
ht
ey
es
m
t ri
sig
at
e
Voiced
pe
ro
pe
pa
ce
ic
e
Voiceless
Figure 5. Fricative and stop closure duration (ms)
NNSs
NSs
450
400
350
300
250
200
150
100
50
0
e
sid
ad
e
m
be
tri
ro
be
pa
ys
ey
es
ht
sig
ate
Voiced
m
pe
t ri
ro
pe
pa
ce
ic
e
Voiceless
Figure 6. Vowel duration (ms)
Table 3. Stop closure duration differences before voiceless and voiced consonants according to
speaker group
ms.
Ice-eyes
pace-pays
rope-robe
tripe-tribe
mate-made
sight-side
NNSs
NSs
Mann-Whitney U test
116.75 (14.86)
U= 3.000; Z= -3.178; Sig.= .000
38.93 (56.95)
99.00 (25.62)
U= 31.000; Z= -2.155 ; Sig.= .029
1.66 (39.51)
31.50 (31.67)
U= 9.000; Z= -1.389 ; Sig.= .199
7.46 (16.18)
27.66 (18.77)
U= 7.000; Z= -1.682 ; Sig.= .111
8.13 (48.19)
22.75 (17.28)
U= 18.000; Z= -1.912 ; Sig.= .059
-1.36 (30.70)
28.00 (21.36)
U= 14.500; Z= -2.246 ; Sig.= .020
6.17 (46.85)
NSs’ and NNSs’ productions were also found to differ with respect to the duration of
the vowel preceding the oral stop (see Table 2). NSs produced shorter vowels before voiceless
stops and fricatives and longer vowels before voiced stops and fricatives than NNSs (see
Figure 6). This pattern appears to be systematic and without exception, suggesting that the
difference in the duration of the preceding vowel is always greater for NSs than for NNSs and
_____________________
New Sounds 2007: Proceedings of the Fifth International Symposium on the Acquisition of Second Language Speech
- 214 -
therefore NSs can make use of this difference in length as a cue to voicing. A series of MannWhitney U tests revealed that the difference in vowel duration before voiced and voiceless
fricatives and oral stops is always significantly greater for NSs than for NNSs (see Table 4).
NNSs, the same as NSs, appear to make use of vowel duration as a cue to voicing and do also
consistently produce significant differences in vowel length according to the voicing
specification of the following obstruent. The size of the difference in length, however, is
significantly smaller in NNSs’ production (see Figure 7).
Table 4. Vowel duration differences before voiceless and voiced consonants according to speaker
group
ms.
ice-eyes
pace-pays
rope-robe
tripe-tribe
mate-made
sight-side
NNSs
NSs
Mann-Whitney U test
70.76 (58.64)
214.25(34.75)
84.69 (54.99)
193.66 (48.52)
U= 8.000; Z= -2.544 ; Sig.= .011
61.80 (55.18)
138.50 (42.06)
U= 10.000; Z= -2.530 ; Sig.= .011
59.68 (50.82)
134.00 (40.73)
U= 8.000; Z= -2.492 ; Sig.= .013
57.55 (38.33)
129.75 (20.64)
U= 8.000; Z= -3.014 ; Sig.= .003
54.47 (40.45)
130.50 (27.08)
U= 8.000; Z= -2.963 ; Sig.= .003
NNSs
U= 3.000; Z= -3.189; Sig.= .001
NSs
250
200
150
100
50
0
Vowel duration
e
e
es
ys
de
be
id
ib
y
a
o
-e
ma
-r
-p
-tr
t-s
e
e
e
e
h
e
c
c
p
p
i
at
sig
ro
pa
t ri
m
Fricative and stop closure
e
e
es
ys
de
be
id
ib
y
a
-e
ma
- ro
-p
-tr
t-s
e
e
e
e
h
e
c
c
p
p
i
at
sig
ro
pa
t ri
m
Figure 7. Duration differences between voiceless vs. voiced contexts
Acoustic analyses of the word-final obstruents produced by NNSs and NSs revealed
differences in the presence of vocal cord vibration during the production of fricatives and the
closure phase of stops as well as in whether the oral stop closures were unreleased or audibly
released (with or without friction noise following the release burst). Native speakers behaved
very consistently in that all underlyingly voiceless consonants were produced without vocal
cord vibration and all underlyingly voiced consonants were at least partially voiced.
Devoicing is a common feature of English word-final obstruents (e.g., Cruttenden 1994, p.
141), but it does not occur in the NSs’ productions analyzed here, probably due to the fact that
the NS informants were instructed to read the carrier phrases to be used in the imitation task
“carefully” at “normal speed” and the target word-final consonants occurred always in
absolute sentence position at the end of the carrier phrase. As regards the release phase of oral
stops, NSs produced all oral stops with audible release bursts and in voiceless stops the
release burst was typically followed by friction produced by the sudden release of the air
pressure at the opening of the oral closure. As expected, NNSs were much less consistent in
the use of voicing and in fact the percentage of underlyingly voiced consonants that were fully
devoiced was not very different from the percentage of underlyingly voiceless consonants that
_____________________
New Sounds 2007: Proceedings of the Fifth International Symposium on the Acquisition of Second Language Speech
- 215 -
were produced without vocal cord vibration. The fact that NNSs produced some occurrences
of voiceless /s/ (28.7%), /p/ (8%) and /t/ (2.1%) with vocal cord vibration suggests that these
might have been interpreted as being underlyingly voiced in the imitation task (/s/ in
particular), which would be indicative of perceptual problems when attending to the phonetic
cues that distinguish English voiced and voiceless obstruents in word-final position. The
results of the discrimination task, however, do not seem to confirm this hypothesis, as overall
correct discrimination scores were at ceiling and little difference is found across contrasting
sounds in mean percent correct discrimination (see Table 1).
Underlyingly voiceless
Underlyingly voiced
100
90
80
70
60
50
40
30
20
10
0
/s/-/z/
/p/-/b/
/t/-/d/
Figure 8. Percentage of word-final obstruents realized without vocal cord vibration
Thus, underlyingly voiced stops are only slightly less frequently realized without
vocal cord vibration than underlyingly voiceless consonants (see Figure 8) and the /s/-/z/
contrast appears to be virtually non-existent with respect to voicing. A similar pattern is
obtained when the percentage of audible release bursts followed by friction noise is calculated
for voiced and voiceless consonants, which is only slightly higher for voiceless than for
voiced consonants, as opposed to the pattern observed in the NSs’ productions, which present
a huge contrast between voiced (0%) and voiceless stops (81.25% on average).
The segmental duration differences observed between NNSs’ and NSs’ productions
suggest that NNSs succeed in making use of the duration differences in vowel length resulting
from the voicing specification of the following consonant as a cue to voicing, but do not do so
to the extent that NSs do. The significantly much shorter vowel duration measures obtained in
NNSs’ productions are not compensated for by producing greater differences than NSs in
fricative and stop closure duration between voiced and voiceless consonants, or by
consistently making use of closure voicing in voiced consonants. This confirms that the wordfinal voicing contrasts examined are less robustly realized by NNSs than by NSs. Therefore,
we would predict a positive effect of greater experience in the L2 and an earlier onset of L2
learning in terms of a more robust native-like realization of the voicing contrasts in word-final
position through a significant increase in segmental length differences, less devoicing in
voiced consonants, absence of closure voicing in voiceless consonants and greater frequency
of voiceless stops released with audible friction.
In order to explore the effect of onset age of L2 learning and experience on segmental
duration in the realization of the voicing contrast in word final obstruents, duration measures
were submitted to a series of independent-sample tests (Mann-Whitney U and KruskalWallis) with the following subject-grouping variables as factors: age of onset of L2 learning
(AOL), extra exposure to English (EE), stays abroad (SA), use of English during the SA
(SAU) and gender (G). In general, the analyses failed to reveal significant differences
_____________________
New Sounds 2007: Proceedings of the Fifth International Symposium on the Acquisition of Second Language Speech
- 216 -
between the groups defined by the factors under examination, but some statistically
significant differences in segmental duration were found for specific sound contrasts and
some general tendencies could be observed (see Table 5).
Table 5. Segmental duration (ms) according to EE (SD in parentheses)
Vowel Duration
NO extra
Extra exposure
exposure
ms
/s//z/
/p//b/
/t//d/
ice
eyes
pace
pays
rope
robe
tripe
tribe
mate
made
sight
side
262.36 (52.18)
253.68 (49.62)
354.79 (55.42)
314.78 (49.19)
214.64 (51.87)
210.23 (38.32)
307.31 (74.47)
289.91 (59.08)
187.00 (36.77)
181.00 (31.05)
238.00 (38.70)
250.58 (61.21)
172.07 (43.38)
171.89 (24.58)
247.86 (58.09)
229.77 (48.44)
230.93 (40.30)
206.22 (33.21)
291.21 (58.17)
264.45 (50.81)
232.43 (47.86)
216.87 (39.08)
303.00 (61.23)
265.56 (47.53)
Segmental Duration
V. Duration Difference
NO extra
Extra
exposure
exposure
92.42 (38.23)
91.53 (72.89)
51.10 (32.27)
75.78 (61.78)
60.28 (36.98)
63.84 (34.11)
Consonant Duration
NO extra
Extra
exposure
exposure
192.43 (47.22)
61.67 (65.77)
81.64 (48.78)
75.57 (63.58)
53.56 (41.53)
57.32 (38.88)
50.25 (44.49)
196.48 (54.25)
188.71 (57.08)
181.31 (47.19)
218.29 (68.65)
209.23 (56.74)
190.08 (68.80)
161.59 (49.52)
89.88 (19.90)
77.25 (25.21)
83.20 (12.89)
86.89 (25.39)
102.25 (15.96)
91.12 (20.47)
101.00 (32.55)
82.62 (23.71)
89.78 (23.33)
84.35 (50.11)
91.88 (26.72)
72.61 (31.83)
95.27 (28.41)
86.23 (35.05)
112.38 (19.20)
77.06 (22.97)
For example, no significant segmental duration differences were found between
subject groups according to AOL and no consistent pattern could be observed for vowel
duration. However, with the exception of /z/, earlier starters (AOL before 8) were found to
produce slightly shorter voiced consonants and slightly longer voiceless consonants than later
starters (AOL after 8). This native-like trend, however, is not fully confirmed by the duration
measures obtained for vowels, as it is not always the case that vowel duration differences
according to the voicing value of the following consonant are greater for earlier starters: this
is the case for ice-eyes, rope-robe (p=.047; Z=-1.985), and tripe-tribe, but not for the rest of
word pairs (see Figure 9).
AOL <8
AOL >8
250
200
150
100
50
0
e
sid
ad
e
m
e
tri
b
ro
be
pa
ys
ey
es
t
sig
h
at
e
Voiced
m
e
tri
p
ro
pe
pa
ce
ic
e
Voiceless
Figure 9. Consonant duration (ms)
_____________________
New Sounds 2007: Proceedings of the Fifth International Symposium on the Acquisition of Second Language Speech
- 217 -
NO EE
EE
450
400
350
300
250
200
150
100
50
0
e
sid
m
ad
e
e
tri
b
ro
be
pa
ys
s
ey
e
ht
sig
at
e
Voiced
m
e
tri
p
ro
pe
pa
ce
ic
e
Voiceless
Figure 10. Vowel duration (ms) as a function of extra exposure
Vowel duration (except for robe) was found to be slightly shorter for those
participants who had received extra exposure to English outside school than for those who
had not. This vowel duration difference, however, cannot be interpreted as implying a gain in
production accuracy because (a) it affected voiced and voiceless consonants similarly,
reaching significance in the case of eyes (/z/: p=.045; Z=-0.242) and approaching significance
in mate and side (/t/: p=.077; Z=-1.767; /d/: p=.067; Z=-1.834) and (b) the difference in
vowel duration between voiced and voiceless contexts (except for the word pair rope-robe)
tended to be smaller for the group that received extra exposure , reaching significance for the
ice-eyes word pair (p=.05; Z=-1.962) (see Figure 10). All of this suggests, contrary to our
predictions, that extra-exposure group would be less proficient than the no-extra-exposure
group in making use of vowel duration differences to signal voicing in word-final obstruents.
The analysis of consonant duration data shows a similar pattern of results, with the
extra-exposure group in general producing slightly shorter voiced and voiceless consonants
than the no-extra-exposure group (except for made). It would seem that extra exposure has the
effect of slightly reducing segmental duration in general, both in vowels and consonants and
irrespective of the voicing specification of the word-final obstruents.
When segmental duration was analyzed according to extra exposure to English
obtained through stay-abroad periods (never, ≤3 months without FI, ≤3 months with FI, >3
months with FI) no significant differences were found. In this case, no consistent pattern or
general tendency could be observed (see Table 6). However, as observed for those subjects
who had received extra exposure to English outside school, there is a slight tendency for
subjects reporting greater amount of L2 use abroad (75% vs. 50% or 25%) to produce both
vowels and consonants with shorter durations irrespective of their voicing value, except for
robe and the closure duration of alveolar stops, which is longer for those subjects reporting
75% of everyday use of English while abroad.
_____________________
New Sounds 2007: Proceedings of the Fifth International Symposium on the Acquisition of Second Language Speech
- 218 -
Table 6. Segmental duration (ms) as a function of SAU (SD in parentheses)
ms
Vowel Duration
50
≤ 25
≥ 75
%
ice
/s/
/z/
eyes
pace
pays
rope
/p/
/b/
robe
tripe
tribe
mate
/t/
/d/
made
sight
side
272.36
(56.14)
329.00
(50.28)
216.55
(47.28)
316.73
(68.81)
193.33
(45.15)
242.00
(36.59)
175.73
(29.51)
230.64
(55.08)
218.64
(40.48)
272.18
(50.13)
219.18
(44.46)
274.09
(43.48)
266.09
(55.89)
324.55
(51.54)
206.00
(50.84)
279.91
(62.96)
189.88
(29.48)
234.83
(46.01)
179.11
(26.17)
258.50
(70.86)
212.00
(42.06)
272.91
(54.52)
232.73
(51.42)
265.00
(62.66)
232.45
(43.24)
316.36
(57.26)
200.55
(35.26)
290.45
(57.40)
169.86
(17.29)
256.88
(81.96)
149.44
(31.53)
224.80
(36.45)
208.18
(23.83)
264.00
(60.99)
210.20
(22.83)
269.57
(38.17)
Segmental Duration
V. Duration Difference
50
≤ 25
≥ 75
56.63
(49.74)
58.45
(64.16)
83.90
(66.85)
100.18
(52.82)
79.00
(64.13)
89.90
(48.28)
47.75
(32.99)
46.66
(25.25)
105.66
(83.50)
54.90
(34.01)
87.87
(80.99)
68.50
(43.91)
53.54
(37.21)
60.90
(43.92)
52.80
(44.15)
54.90
(23.06)
32.60
(48.55)
65.57
(39.02)
Consonant Duration
50
≤ 25
≥ 75
201.30
(35.91)
200.82
(52.62)
217.73
(49.86)
161.18
(39.08)
90.75
(27.03)
82.00
(18.08)
102.75
(18.00)
99.00
(27.64)
77.00
(29.28)
77.83
(26.28)
90.90
(39.64)
83.75
(29.64)
201.64
(62.42)
177.00
(44.80)
207.20
(77.78)
180.82
(72.41)
78.20
(19.98)
83.00
(2.82)
93.00
(22.71)
83.33
(36.29)
94.80
(97.24)
56.50
(8.50)
68.29
(25.81)
74.20
(21.05)
173.80
(53.27)
166.45
(57.22)
194.27
(56.31)
153.09
(52.70)
61.33
(13.05)
93.60
(29.89)
97.25
(18.24)
97.60
(15.91)
103.75
(25.59)
98.50
(28.24)
105.56
(37.02)
104.33
(27.38)
As far as gender is concerned, female speakers tended to produce slightly longer
vowels in all contexts (except for ice and robe), but none of these differences turned out to be
significant, except for the /z/ in pays, which almost reached significance (p=.051; Z=-1.952).
Female speakers also produced slightly greater (but non-significant) differences according to
context (voiced vs. voiceless consonant) in the alveolar fricative and oral stop sound contrasts.
Except for labials and the /d/ in made, female participants also produced longer consonants
than males, irrespective of voicing.
The effect of age of onset of L2 learning and experience on the realization of the
voicing contrast in word final obstruents with respect to the presence of vocal cord vibration
(during friction and the closure phase of oral stops) and the presence of a release burst
followed by audible friction, was explored through Pearson χ2 and Fisher’s Exact Probability
tests with the same subject-grouping variables as factors that were used to examine segmental
duration – i.e., AOL, EE, SA, SAU, and G. In general, the analyses failed to reveal significant
differences between the groups defined by these factors, suggesting that neither AOL nor
experience had any effect on subjects' ability to accurately produce the English voicing
contrast in word-final obstruents.
4. Discussion and conclusions
The present study examined the perception and production of voicing contrasts in English
word-final obstruents – that is, /s/-/z/, /p/-/b/, and /t/-/d/ – by 48 Catalan/Spanish learners of
English varying in age of onset of FL learning and experience in the TL in a formal learning
context.
Concerning the AXB discrimination task, neither starting age – before and after age 8
– nor experience – extra exposure and/or stays abroad – had a significant effect on the correct
discrimination scores for the three consonant contrasts. What is more, learners discerned the
three contrasts at similar high correct rates, which is at odds with findings of an analogous
_____________________
New Sounds 2007: Proceedings of the Fifth International Symposium on the Acquisition of Second Language Speech
- 219 -
population in Mora (2007), where Catalan/Spanish advanced learners of English perceived /t//d/ more accurately than /s/-/z/. When taken together with findings in Mora and Fullana
(2007), the learners in this study also differed from those in Mora (2007) in that the 48
Catalan/Spanish learners of English discriminated English consonant contrasts better than
vowel contrasts (M = 92.67 and 86.85, respectively).
Despite these differences and the lack of significant effects of the research variables,
the observed tendency of a late starting age advantage in the perception of voicing contrasts is
in line with findings of formal learning contexts (e.g., Gallardo & García-Lecumberri, 2006).
Moreover, the somewhat better performance on the part of learners without extra curricular
exposure or with longer stay-abroad periods corroborates the mixed exposure effects so far
reported for both immersion and formal learning settings (e.g., Cebrian, 2006; Flege et al.,
1992; Mora, 2007).
Additional non-significant starting age and exposure effects were obtained for
learners’ production of English /s/-/z/, /p/-/b/, and /t/-/d/. However, unlike perception results,
earlier starters, rather than late starters, resembled English NSs more closely by producing
slightly shorter voiced and longer voiceless consonants in word-final position, in addition to
greater vowel duration differences in some word-pairs; hence agreeing with the expected
earlier starting age advantage in naturalistic settings. However, an increase in exposure by
means of extra curricular language courses or stay-abroad periods did not result in learners’
consistent use of vowel duration differences as a cue to voicing in word-final obstruents.
Thus, the latter finding further corroborates the inconclusive experience effects noted for
Catalan/Spanish speakers’ discernment of English voicing contrasts.
Besides, findings of learners’ nonnative-like production of inexistent voicing contrasts
in their L1 did not conform to the predictions of the SLM (Flege, 1995). Furthermore,
Catalan/Spanish speakers’ realization of underlyingly voiced stops without vocal cord
vibration at rates similar to those of underlyingly voiceless stops, and absence of voicing in
voiced fricatives, agrees with production difficulties observed for Romance language speakers
(Cebrian, 2000; Flege et al., 1995b).
In sum, in the current FL learning setting, age of onset of FL learning and exposure to
the TL failed to determine Catalan/Spanish bilinguals’ perception and production of voicing
contrasts of English word-final /s/-/z/, /p/-/b/, and /t/-/d/. The findings of the present study
further point to the need to address input quality – particularly, specific formal instruction that
deals with the perception and production of English sounds – when assessing starting age and
experience effects.
Acknowledgements
This study was funded by Grant HUM2004-05167/FILO from the Ministerio de Ciencia y
Tecnología in Spain. Part of this work was also made possible thanks to a postdoctoral
fellowship to the first author by the Secretaría de Estado de Universidades e Investigación del
Ministerio de Educación y Ciencia (Spain). The authors would like to thank the Laboratori de
Fonètica and Laboratori de Fonètica Aplicada at the Universitat de Barcelona, Anna Maria
Agustí, Cristina Aliaga and Eva Cerviño for help with stimulus preparation and data
collection, as well as all the participants in the study.
References
Boersma, P., & Weenink, D. (2007). Praat: Doing phonetics by computer (Version 4.6.05) [Computer
program]. Retrieved June 2, 2007, from http://www.praat.org/
_____________________
New Sounds 2007: Proceedings of the Fifth International Symposium on the Acquisition of Second Language Speech
- 220 -
Cebrian, J. (2000). Transferability and productivity of L1 rules in Catalan-English interlanguage.
Studies in Second Language Acquisition, 22, 1-26.
Cebrian, J. (2006). Experience and the use of non-native duration in L2 vowel categorization. Journal
of Phonetics, 34, 372-387.
Flege, J. E. (1988.) The development of skill in producing word-final English stops: Kinematic
parameters. Journal of the Acoustical Society of America, 84, 1639-1652.
Flege, J. E. (1989). Chinese subjects’ perception of the word-final /t/-/d/ contrast: Performance before
and after training. Journal of the Acoustical Society of America, 86, 1684-1697.
Flege, J. E. (1993). Production and perception of a novel, second-language phonetic contrast. Journal
of the Acoustical Society of America, 93, 1589-1589.
Flege, J. E., & Davidian, R. D. (1984). Transfer and developmental processes in adult foreign
language speech production. Applied Psycholinguistics, 5, 323-347.
Flege, J. E., & Eefting, W. (1987). Production and perception of English stops by native Spanish
speakers. Journal of Phonetics, 15, 67-83.
Flege, J. E., & Hillenbrand, J. (1986). Differential use of temporal cues to the /s/-/z/ contrast by native
and non-native speakers of English. Journal of the Acoustical Society of America, 79, 508-517.
Flege, J. E., & Port, R. (1981). Cross-language phonetic interference: Arabic to English. Language
and Speech, 24, 125-146.
Flege, J. E., & Wang, C. (1989). Native language phonotactic constraints affect how well Chinese
subjects perceive the word-final English /t/-/d/ contrast. Journal of Phonetics, 17, 299-315.
Flege, J. E., McCutcheon, M. J., & Smith, S. C. (1987). The development of skill in producing wordfinal English stops. Journal of the Acoustical Society of America, 82, 433-447.
Flege, J. E., Munro, M. J., & MacKay, I. R. A. (1995a). Factors affecting strength of perceived foreign
accent in a second language. Journal of the Acoustical Society of America, 97, 3125-3134.
Flege, J. E., Munro, M. J., & MacKay, I. R. A. (1995b). Effects of age of second-language learning on
the production of English consonants. Speech Communication, 16, 1-26.
Flege, J. E., Munro, M. J., & Skelton, L. (1992). Production of the word-final English /t/-/d/ contrast
by native speakers of English, Mandarin and Spanish. Journal of the Acoustical Society of
America, 92, 128-143.
Gallardo, F., & García-Lecumberri, M. L. (2006). Age effects on single phoneme perception for
learners of English as a foreign language. In C. Abello-Contesse, R. Chacón-Beltrán, M. D.
López-Jiménez, & M. M. Torreblanca-López (Eds.), Age in L2 acquisition and teaching (pp. 115131). Bern, Switzerland: Peter Lang.
MacKay, I., Meador, D., & Flege, J. E. (2001). The identification of English consonants by native
speakers of Italian. Phonetica, 58, 103-125.
McAllister, R. (2007). Strategies for realization of L2-categories. English /s/-/z/. In O.-S. Bohn, & M.
J. Munro (Eds.), Language experience in second language speech learning. In honor of James
Emil Flege (pp. 153-166). Amsterdam: John Benjamins.
Mora, J. C. (2007). Learning context effects on the acquisition of a second language phonology. In C.
Pérez-Vidal, M. Juan-Garau, & A. Bel (Eds.), A portrait of the young in the new multilingual
Spain (pp. 241-263). Clevedon: Multilingual Matters.
Mora, J. C., & Fullana, N. (2007). Production and perception of English /i/-// and /æ/-/!/ in a formal
setting: Investigating the effects of experience and starting age. Proceedings of the 16th
International Congress of Phonetic Sciences (pp. 1613-1616). Saarbrücken, Germany.
Wang, Y., & Behne, D. (2007). Temporal remnants from Mandarin in nonnative English speech. In OS Bohn, & M. J. Munro (Eds.), Language experience in second language speech learning. In
honor of James Emil Flege (pp. 153-166). Amsterdam: John Benjamins.
Yava, M. (1994). Final stop devoicing in interlanguage. In M. Yava (Ed.), First and second
language phonology (pp. 267-282). San Diego, CA: Singular.
_____________________
New Sounds 2007: Proceedings of the Fifth International Symposium on the Acquisition of Second Language Speech
- 221 -