The Perception of Lexical Stress in Spanish

15th ICPhS Barcelona
The perception of lexical stress in Spanish
Joaquim Llisterri, María Machuca, Carme de la Mota, Montserrat Riera, Antonio Ríos
Universitat Autònoma de Barcelona
E-mail: {joaquim | maria | carme | montse | mestre}@liceu.uab.es
ABSTRACT
As in other languages, stress in Spanish is signalled by
three simultaneous acoustic cues: fundamental frequency
(F0), duration and intensity. In this experiment, the role of
these parameters in the perception of lexical stress in
isolated words has been studied using natural resynthesised
speech. Results show that the F0 contour alone is not
enough to allow the identification of the stressed syllable of
a word. However, in combination with duration, intensity
or both duration and intensity, F0 is a relevant acoustic cue
for the perception of lexical stress. On the other hand,
intensity and duration, either combined or in isolation, are
not sufficient for the identification of the stressed syllable
within a word.
1.
INTRODUCTION
It is generally acknowledged that, as in other languages,
stress in Spanish is signalled by three simultaneous acoustic
cues: fundamental frequency (F0), duration and intensity.
An earlier perceptual study using synthetic stimuli [1]
concluded that, in Spanish, F0 is the only parameter
systematically related to the identification of the stressed
syllable of a word, while the role of duration depends on the
stress pattern. The experiments with natural resynthesised
speech reported in [2, 3] indicated that a replacement of the
F0 contour is not enough to induce the identification of the
stressed syllable if the other two parameters are not
modified.
Taking advantage of the fact that Spanish is a free accent
language –i.e. lexical stress can appear in any syllable of
the word– a perceptual experiment with natural
resynthesised speech has been designed to assess the role of
F0, duration and intensity in the identification of the
stressed syllable in isolated words, using lexical items with
the same segmental content but with differences in stress
placement. To take into account the role of lexical
knowledge, phonologically acceptable but non-existent
words have also been included in the test corpus. The
contribution of each acoustic cue has been examined both
in isolation and in combination with other cues.
2.
then been manipulated to obtain the test stimuli following
the procedure explained below.
The recorded corpus consisted of four meaningful three
syllable words with constant CV structure allowing the
stress to be placed on the first (proparoxytone), second
(paroxytone) and final (oxytone) syllable –número, numero,
numeró; límite, limite, limité; médico, medico, medicó;
válido, valido, validó– and four meaningless words in
which the position of the stress has also been varied
–*núlibo, *nulibo, *nulibó; *ládebo, *ladebo, *ladebó;
*máledo, *maledo, *maledó; *lúguido, *luguido,
*luguidó–.
The 240 target words (10 repetitions of 8 words x 3 stress
patterns) were analysed using the Praat software (©P.
Boersma & D. Weenink, Institute of Phonetics, University
of Amsterdam, http://www.praat.org). F0 was measured at
the beginning, centre and end of each of the three vowels in
the word. Intensity values were obtained from five
equidistant points within each vowel. Vowel duration was
also analysed. Mean values for the 10 repetitions of each
word were obtained, and they were used to create the set of
basic stimuli.
Several modifications were then performed to obtain the
stimuli used in the perceptual test. In words with lexical
stress in the first syllable (as válido), mean F0, duration and
intensity values for each vowel were replaced by the mean
F0, duration and intensity values found in the equivalent
word with stress on the second syllable (as valido).
Moreover, in words with lexical stress on the second
syllable (as valido), the F0 duration and intensity values for
each vowel were changed by the values found in the
corresponding words with lexical stress on the final syllable
(as validó). Oxytone words were not manipulated to avoid
shifting an F0 peak or duration and intensity values across a
word boundary.
Figures 1 and 2 show an example of the manipulation of a
single parameter: the F0 of the vowels in the words válido
and valido is replaced by the F0 of the corresponding
vowels in valido and validó respectively, while the original
duration and intensity are maintained.
EXPERIMENTAL PROCEDURE
The primary data for the experiment has been extracted
from the analysis of a corpus of isolated words read by a
native speaker of Castilian Spanish. The recordings have
2023
ISBN 1-876346-48-5 © 2003 UAB
15th ICPhS Barcelona
their stress pattern. Within each test, stimuli were presented
in random order.
(a) Válido [balio] with (b) Válido [balio] with
F0 contour extracted from
original F0 contour
valido [balio]
b
a
l
i δ
o
b
a
l
i δ
o
Figure 1: Waveform, F0 (black line) and intensity (grey line)
for the word válido (a) and waveform, F0 and intensity after
superimposing the F0 contour of the word valido (b).
(a) Valido [balio] with (b) Valido [balio] with
F0 contour extracted from
original F0 contour
validó [balio]
The tests were administered through individual headphones
at the Language Laboratory of the Department of French
and Romance Philology at the Universitat Autònoma de
Barcelona. Subjects were given written instructions on
paper as well as an oral briefing; they were warned of the
presence of existent and non-existent words on the test, and
of the requirement that no blank replies were allowed. A set
of five training stimuli was included at the beginning of
each test, and questions on the procedure were taken after
each training period. As there were more than 600 stimuli,
the test was divided in two sessions: one in which stimuli
with modifications in a single acoustic cue were presented
and another one in which stimuli with cues in combination
were used. In order to avoid listeners’ fatigue, breaks were
introduced in each session, during which simple distracting
activities were carried out. Thirty speakers of Spanish,
students at the Universitat Autònoma de Barcelona, with
ages between 18 and 45 years old, responded to the test. A
total of 18480 replies were obtained.
3.
b a
l
i
δ
o
b a
l
i
δ
Resynthesised natural items do not present special
problems to the listeners when they are asked to identify the
stress pattern. For stimuli without manipulation in the
acoustic parameters – i.e. with the averaged values obtained
from the reference speaker– correct identification of stress
placement ranges from 92.97% to 100% in meaningful
words and from 91.41% to 100% in meaningless words.
o
Figure 2: Waveform, F0 (black line) and intensity (grey line)
for the word valido (a) and waveform, F0 and intensity after
superimposing the F0 contour of the word validó (b).
To create the test stimuli, each word was resynthesised with
the replaced values using PSOLA as implemented in Praat.
The values of each acoustic parameter (F0, duration and
intensity) has been, in the first place, modified individually,
maintaining the original values of the other two parameters.
Then, the values for two parameters have been
superimposed together (F0 and duration, intensity and
duration and F0 and intensity) maintaining the original
values of the third. Finally, the values of the three
parameters have been simultaneously modified by
replacing all the original values. This strategy has allowed
the study of the perceptual effects of each acoustic cue both
in isolation and in combination with others.
Two different kinds of tasks have been proposed to the
subjects who participated in the experiment. In the first one
(test 1), they were asked to identify the syllable bearing the
stress –the first, the second or the last– in a total of 336
isolated words. In the second task (test 2), subjects were
asked wether 280 pairs of words were equal or different in
ISBN 1-876346-48-5 © 2003 UAB
RESULTS
Results corresponding to the judgements about stimuli with
modified acoustic parameters are shown below. Results
from the first test, in which subjects were asked to identify
the syllable bearing the stress, are presented in Table 1.
Results obtained from the second test, in which subjects
had to decide if a pair of words were coincident or not in
their stress pattern are shown in Table 2. In both tables,
results for the manipulation of each acoustic parameter in
isolation are presented first (F0, D, I) , followed by those
obtained with the modification of paired acoustic cues (D+I,
F0+D, F0+I); finally, the results of the simultaneous
manipulation of F0, duration and intensity are given
(F0+D+I). Besides, results for meaningful words are
presented in regular style and for meaningless words in
italics. The second column indicates the modification that
has been performed in the stimuli: for example, “PP with P
values” means an originally proparoxytone words in which
the values of the target acoustic parameters have been
replaced by those of a paroxytone word. In columns 3, 4
and 5 in table 1, the percentages of identification of the
stimuli as a proparoxytone (PP), a paroxytone (P) or an
oxytone (O) word are presented. In table 2, “S” and “D” in
the columns labelled “PP”, “P” and “O” correspond to the
percentage of identification as same or different of paired
words.
2024
15th ICPhS Barcelona
PP
F0
P
PP with P values 61.67 38.33
52.78
P with O values
15
45
O
0
2.22
70.56 14.44
6.11 69.44 24.44
D
PP with P values 99.44 0.56
0
96.67 3.33
0
P with O values
2.22 96.11 1.67
13.33
I
85
PP with P values 98.33 1.11
98.33 1.67
P with O values
0
1,67
0
2.22
97.78 2.22
1.33 97.33 1.33
D+I
PP with P values 93.23 6.77
0
91.15 8.85
0
P with O values
5.21 80.73 14.06
7.03 79.69 16.41
F0+D
PP with P values 4.17 94.79 1.04
13.02 80.73 6.25
P with O values
5.73 16.67 77.60
16.15 16.15 67.70
F0+I
PP with P values 20.31 79.17 0.56
22.92 70.31 6.77
P with O values
1.56
9.38 89.06
4.17 28.91 66.40
F0+D+I PP with P values 1.04 98.44 0.52
4.17 91.14 4.69
P with O values
0
1.56 98.44
9.38
3.12 87.50
Table 1: Results in % from test 1 for meaningful (regular)
and meaningless (italics) words. (F0 = fundamental
frequency; D = duration; I = intensity; PP = proparoxytone;
P = paroxytone; O = oxytone).
As for the identification test, it can be observed that
meaningful proparoxytone words with a single modified
parameter are hardly perceived as paroxytone; in addition,
under the same conditions, meaningful paroxytone words
are perceived as oxytone just in a very few cases. If the F0
contour is the superimposed parameter the results show the
same tendency but with higher scores, since meaningful
proparoxytone words with a paroxytone F0 contour are
perceived as paroxytone in 38.33% of the cases, while
meaningful paroxytone words with an oxytone F0 contour
are perceived as oxytones in 14.44% of the cases. On the
other hand, meaningless proparoxytone stimuli with
duration values taken from paroxytone words are perceived
as paroxytone in not more than 3.33% of the meaningless
words and in 0.56% of the meaningful ones, while
paroxytone words with values from oxytone words are
perceived as oxytone in a maximum 1.67% of the cases.
Results corresponding to intensity values show a very
similar trend.
As for the cases in which two parameters have been
simultaneously modified, the results show that the effect of
the modification is not really perceived by the listeners
unless one of the superimposed parameters is the F0 contour.
For instance, although meaningful proparoxytone words
with intensity and duration values from paroxytone words
are perceived as paroxytone only in 6.77% of the cases,
they are clearly perceived as paroxytone (in 94.79% of the
cases) when F0 and duration are the superimposed
parameters. In a similar way, if the modified parameters are
F0 and intensity, meaningful proparoxytone words are
perceived as being paroxytone in 79.17% of the cases.
Percentages increase when the modification affects F0,
intensity contours and duration simultaneously. In these
cases the rates can reach the 98.44%. Although percentages
are always a bit lower, the same behaviour is observed in
meaningless words in all cases.
It is interesting to notice the simultaneous modification of
F0 and other acoustic parameters triggers a high percentage
of responses showing that a change in the stress pattern has
been detected. This trend is also confirmed by the
judgements obtained from the second test, as it is shown
below.
In table 2, it can be observed that meaningful
proparoxytone words with intensity values taken from
paroxytone words are not perceived as equal to paroxytones
in 0.83% of the cases. When duration is manipulated the
same results are obtained (0.83%). If F0 contour is
manipulated but the original values of the other parameters,
are maintained, listeners the stimuli neither as equal to
paroxytones (32.5%) nor as proparoxytones (45.83). The
same tendency is noted for meaningful paroxytone words:
they are not perceived equal to oxytones when duration
(0.83%) or intensity (0%) are individually manipulated. In
the case of F0 manipulation, listeners consider they are not
oxytones (94.17%), but they have been identified as
paroxytones in a 59.17% of the replies. The same tendency
is observed in meaningless words
When two acoustic cues are combined, results show a
similar tendency when duration and intensity appear
together: meaningful proparoxytone words with intensity
contours and duration values from paroxytone words are
perceived as equal to paroxytone in 4.69% of the cases and
as different from proparoxytones in 14.84% of the cases;
paroxytone words with intensity contours and duration
values from oxytone words are perceived as equal to
oxytones in 14.84% and as different from paroxytones in
24.22% of the cases. On the contrary, proparoxytone words
are perceived as equal to paroxytone words in 98.44%
when F0 is combined with duration, and in 77.34% when F0
is combined with intensity. Paroxytone words are perceived
as equal to oxytone words in 73.44% when F0 is combined
2025
ISBN 1-876346-48-5 © 2003 UAB
15th ICPhS Barcelona
with duration, and in 89.84% when F0 is combined with
intensity. Similar results are observed in meaningless words.
Finally, results obtained from the combination of the three
acoustic parameters are similar to those obtained from the
manipulation of F0 with the other two parameter –duration
or intensity–.
PP
S
P
D
S
F0 PP with P 45.83 54.17 32.5
values
50
50
42.5
P with O
values
D
S
D
67.5
57.5
59.17 40.83 5.83 94.17
75.83 24.17 14.17 85.83
D PP with P 98.33 1.67
values
100
0
I
O
0.83 99.17
2.5
97.50
P with O
values
95.83 4.17
PP with P 98.33 1.67
values
100
0
0.83 99.17
P with O
values
98.33 1.67
84.17 15.83
0.83 99.17
0
100
0
100
0.83 99.17
96.67 3.33
2.22 97.78
D PP with P 85.16 14.84 4.69 95.31
values
+
88.28 11.72 13.28 86.72
I
P with O
values
F0 PP with P
values
+
D
P with O
values
75.78 24.22 14.84 85.16
72.92 27.08 5.21 94.79
0.78 99.22 98.44 1.56
0.38 90.62 89.84 10.16
14.06 85.94 73.44 26.56
28.91 71.09 64.06 35.94
F0 PP with P 8.59 91.44 77.34 22.66
values
+
19.53 80.47 66.41 33.59
I
P with O
values
+
P with O
values
The results reveal that different conclusions can be
obtained depending on the way the F0, duration and
intensity are combined. The superposition of only one of
the three parameters corresponding to another stress pattern
is not sufficient to perceive a clear change of the stress
pattern. Only the superposition of F0 in combination with
one or more parameters triggers a high number of responses
indicating a change in the stress location.
The findings can be briefly summarized as follows. First,
the position of the stress is correctly identified by subjects
if F0 peak, duration or intensity values correspond to the
lexically stressed syllable. Second, in those cases in which
the F0 contour, the intensity contour or duration values
have been replaced by the superimposed ones trying to
displace the perception of the prominence to the right, the
syllable originally bearing the lexical stress is identified by
listeners, in spite of the modification. This behaviour is
found for meaningful and for meaningless words. Third,
when the values of the F0 contour are superimposed
together with the values of the other two parameters
–duration or intensity– listeners identify the syllable
aligned with the F0 peak as stressed. On the contrary, if F0
is not taken into account in the combination, the syllable
with original lexical stress is the one perceived as stressed
by the listeners. Results for meaningless words show a
similar tendency, but the percentages of identification are
lower.
In can be concluded that, at least in isolated words, the F0
contour alone is not sufficient to induce the identification of
the syllable aligned with the F0 peak as stressed if the other
two parameters are not modified, but it is an essential
acoustic cue in combination with duration, intensity or both
duration and intensity.
12.50 87.5 89.84 10.16
43.75 56.25
50
17.97 82.03 83.59 16.41
47.92 52.08 44.79 55,21
I
Table 2: Results in % from test 2 for meaningful (regular)
and meaningless (italics) words. (F0 = fundamental
frequency; D = duration; I = intensity; PP = proparoxytone;
P = paroxytone; O = oxytone; S = same; D = different).
4. CONCLUSIONS
The difficulty of studying the influence of acoustic
parameters on stress perception depends on the fact that
they act simultaneously in natural speech. The method used
ISBN 1-876346-48-5 © 2003 UAB
REFERENCES
50
F0 PP with P 14.84 85.16 71.09 28.91
values
+
25.78 74.22 61.72 38.28
D
in this experiment offers a way of isolating the effect of
each parameter while maintaining the naturalness of stimuli
by using resynthesised natural speech.
[1] E. Enríquez, C. Casado and A. Santos, "La percepción
del acento en español", Lingüística Española Actual vol 11,
pp. 241-269, 1989.
[2] J. Llisterri, M. J. Machuca, C. de la Mota, M. Riera and
A. Ríos, "The role of F0 peaks in the identification of
lexical stress in Spanish", in Phonetics and its Applications.
Festschrift for Jens-Peter Köster on the Occasion of his
60th Birthday, A. Braun and H.R. Masthoff, Eds., pp.
350-361. Stuttgart: Franz Steiner Verlag, 2002.
[3] J. Llisterri, M. J. Machuca, C. de la Mota, M. Riera and
A. Ríos, "Algunas cuestiones en torno al desplazamiento
acentual en español", in La tonía: dimensiones fonéticas y
fonológicas. México: El Colegio de México, 2002.
http://liceu.uab.es/~joaquim/publicacions/Llisterri_et_al_2
002.pdf
2026