Presenting in English and Swedish

Proceedings, FONETIK 2005, Department of Linguistics, Göteborg University
Presenting in English and Swedish
Rebecca Hincks
Department of Speech, Music and Hearing, KTH, Stockholm
Unit for Language and Communication, KTH, Stockholm
requirements. The aim of the small study described in this paper was to gather data to shed
light on the question of how individual speakers might differ in speaking characteristics
when presenting in a first or second language.
Other research has suggested that a narrowed
pitch range is a characteristic of second language speech (Mennen 1998; Pickering 2004),
at the same time as it has been shown that using
pitch effectively is an important means of structuring instructional discourse. In situations such
as exist in Sweden, where students are increasingly judged on tasks performed in a second
language, it is of interest to know the extent to
which that requirement constrains them.
This paper investigates pitch variation levels and speaking rates in the English and Swedish versions of the same presentations. If
speakers were found to use less pitch variation
when speaking English than Swedish, then second language users could be seen as primary
users of a system for encouraging more pitch
variation. It was expected that speaking rates
would be faster for Swedish than for English;
this examination could quantify the differences.
Abstract
This paper reports on a comparison of prosodic
variables from oral presentations in a first and
second language. Five Swedish natives who
speak English at the advanced-intermediate
level were recorded as they made the same
presentation twice, once in English and once in
Swedish. Though it was expected that speakers
would use more pitch variation when they
spoke Swedish, three of the five speakers
showed no significant difference between the
two languages. All speakers spoke more quickly
in Swedish, the mean being 20% faster.
Introduction
Two earlier contributions to the Annual Swedish Phonetics Conference have outlined ideas
for a feedback mechanism for public speaking.
Briefly, Hincks 2003 proposed that speech
technology be used to support the practice of
oral presentations. Speech recognition could
give feedback on repeated segmental errors
produced by non-natives as well as provide a
transcript of the presentation, which could then
be processed for lexical and syntactic appropriateness. Speech analysis could give feedback
on the speaker’s prosodic variability and speaking rate. Hincks 2004 presented an analysis of
pitch variation in a corpus of second language
student presentation speech. Pitch variation was
measured as the standard deviation of F0 for
10-second long segments of speech, normalized
by dividing by the mean F0 for that segment.
This value was termed PVQ, for pitch variation
quotient. Hincks (forthcoming) reports on the
results of a perception test of speaker liveliness,
where a strong correlation (r = .83, n = 18, p <
.01) was found between speaker PVQ and perceptions of liveliness from a panel of eight
judges.
Though automatic feedback on the prosody
of public speaking could be useful for both first
and second language users, the abovementioned studies have been done on a corpus
of L2 English, where native Swedish students
of Technical English were recorded as they
made oral presentations as part of their course
Method
The goal for the data collection used for this
paper was to have a corpus where the same
speaker used both English and Swedish to
make the same presentation, with the same visual material. Because class time could not be
wasted with having students hear the same
presentation twice, the Swedish recordings
needed to be made outside the classroom. All
students studying English at KTH in the fall of
2004 –nearly 100 students—were contacted
and asked whether they would like to participate. They were told that they would first be
recorded in the classroom as they made their
presentations in English, and that they would
then meet in groups and make the same presentations in Swedish to each other. They were offered 150 SEK as compensation for the extra
time it would take. Unfortunately, only five
students were able to participate. These five,
three males and two females, were all intermediate students. They were first recorded in their
45
Proceedings, FONETIK 2005, Department of Linguistics, Göteborg University
English classroom, and then met at the end of
term to be recorded in Swedish. The audience
for the second recording consisted of the other
four students, their English teacher, and me.
Four of the five students used computer-based
visual support for their presentations, and were
instructed to use their English slides for the
Swedish presentation. This assured that the
content of the presentations would be the same.
One student, M3, did not use extensive visual
support.
With WaveSurfer’s (Sjölander and Beskow
2000) ESPS pitch extraction boundaries set at
65-325 Hz for male speakers and 110-550 Hz
for female speakers, pitch extraction was performed for up to 10 minutes of speech for the
five presentations in each language. All pitch
contours were visually inspected for evidence
of extraction errors and the location of the errors noted. The F0 values were exported to a
spreadsheet program, where the erroneous values were deleted, and the means and standard
deviations of 10-second long segments were
calculated. The standard deviation of each segment was divided by the mean of each segment
to determine the PVQ, pitch variation quotient.
Speaking rate was calculated by manually
dividing the transcripts of the presentations into
syllables and dividing by the total time spent
speaking. Because pause time is included in the
calculation, the values achieved are lower than
what might otherwise be found in studies of
spontaneous speech. Another temporal value of
interest is the mean length of runs, which is the
amount of speech, in syllables, a speaker utters
between pauses. This measure has been found
to correlate highly with language proficiency
(Kormos and Dénes 2004). The minimum
pause length was defined as 250 ms.
Pitch variation quotient
0.26
0.24
English
0.22
Swedish
0.20
0.18
0.16
0.14
0.12
0.10
0.08
0.06
M1
M2
M3
F1
F2
Speaker
Figure 1. Mean pitch variation quotient for whole
presentation in both English and Swedish
Temporal measures
The male speakers spoke for a shorter length of
time when making the presentation in Swedish
than when using English, as shown in Figure 2.
700
English
600
Swedish
Seconds
500
400
300
200
100
0
M1
M2
M3
F1
F2
Speaker
Figure 2. Length of time in seconds to make presentation in English and Swedish
Speaking rate
Part of the reason the speakers could make their
presentations in a shorter period of time is that
they spoke on average 20% more quickly. Figure 3 shows the speaking rate per speaker in
syllables per second. The mean speaking rate in
English was 2.97 sps, and for Swedish was 3.58
sps. M3, the only student to use a lot more
pitch variation in Swedish than in English, also
spoke much more quickly in Swedish. Note
also that the two females are more stable in
their speaking rates, and that the fastest and
slowest speakers in one language maintain their
ranking in the other language.
Results
Pitch variation quotients
The mean PVQs per speaker for the two presentations are shown in Figure 1. For three of
the five speakers, there was very little difference in the PVQs when using English and when
using Swedish. Only one speaker, M3, had significantly lower PVQ speaking English, but another, F1, had lower PVQ when speaking
Swedish. Though there are only five speakers,
the mean values reflect the same range as that
found in the all-English corpus, with a low of
about 0.11 and a high of about 0.24.
46
Proceedings, FONETIK 2005, Department of Linguistics, Göteborg University
the larger, all-English corpus, where an attempt
was made to gather data from every student in a
class. It is reassuring, however, that the ranges
of prosodic variables for these five speakers
reflect nearly the same ranges as that of the first
corpus.
5.0
Syllables per second
4.5
English
4.0
Swedish
3.5
3.0
2.5
2.0
Language or performance?
The result that three of five speakers showed no
significant difference in PVQ depending on the
language they were using would seem to indicate that PVQ measures are more speaker dependent than language dependent, at least for
native speakers of Swedish. The hypothesis that
the speakers would use less pitch variation
when speaking English was not at all born out
by the study. It seems that the PVQ depends
mostly on speaking style, and perhaps the energy one puts into ‘performing’ in a certain
situation. The English presentation was a
higher-stakes event, where students were
speaking to more people and, most importantly,
receiving a grade on their work. Speaker F1
performed very well for her first presentation,
and with the high mean length of runs combined with higher-than average mean PVQ,
probably would have received high liveliness
ratings had her speech been part of the perception test. It is interesting that she was the only
student to have lower PVQ values and the only
student to have lower MLR values in Swedish
than in English. This could indicate that she in
some way put less effort into performance for
the Swedish presentation. Speaker M3, on the
other hand, was either hampered by using English or relatively unprepared when making the
first presentation. He could have benefited by
rehearsing with a feedback mechanism beforehand.
For the purposes of a thesis grounded in
computer-assisted language learning, these results throw a bit of a wrench in the works. The
problems I am proposing to help may not depend on the use of a second language, but on
more basic features of speaking style. On the
other hand, at advanced levels of language
courses, it is difficult to separate the needs of
first and second language users. Furthermore,
many native speakers as well as non-natives
obviously have problems achieving an engaging speaking style, and it has never been my
intention to propose a device restricted to nonnative use.
1.5
1.0
0.5
0.0
M1
M2
M3
F1
F2
Speaker
Figure 3. Speaking rate in syllables per second for
three males and two females in English and Swedish
Mean length of runs
A variable found to be important in the perception of liveliness in female speech samples
(Hincks forthcoming) is the number of syllables
between pauses of >250 ms (MLR). Four of the
five speakers had higher values for this measure when speaking Swedish (Figure 4). The exception was F1, the same speaker who used less
pitch variation in Swedish.
16
English
14
Mean length of runs
Swedish
12
10
8
6
4
2
0
M1
M2
M3
F1
F2
Speaker
Figure 4. Mean length of runs (number of syllables
between >250 ms pauses) using English and Swedish
Discussion
This study was performed on a small group of
speakers, and any results should be interpreted
with care. The students who participated were
paid volunteers, and in that sense cannot be
considered as representative of the population
to the same extent as the speakers recorded for
47
Proceedings, FONETIK 2005, Department of Linguistics, Göteborg University
Further work
A small study is being planned to test the perception of liveliness in these speakers as they
used the two languages.
The corpus described in this chapter could
be augmented by a small number of speakers
over the period of several terms and could provide a wealth of further opportunities for language study. Comparison of the English and
Swedish transcripts will allow examination of
aspects such as how the speakers use pitch
movement in utterances that are comparable
content-wise. This could provide insight into
transfer of Swedish intonational patterns to
English. It is possible that with more speakers,
statistically significant differences in PVQ
could still be found. The differences in mean
speaking rate should also be further investigated—the 20% difference found in this group
would be interesting to pursue. Does the average Swedish speaker of English manage to say
only 80% of what a native speaker can say during the allotted time at a conference? Documenting such information about first and second language use would give valuable evidence
for those in positions of developing language
policy.
References
Hincks, R. (2003). Tutors, tools and assistants
for the L2 user. Phonum 9: 173-176, Umeå
University Department of Philosophy and
Linguistics.
Hincks, R. (2004). Standard deviation of F0 in
student monologue. Proceedings of Fonetik
2004, Stockholm, Department of Linguistics, Stockholm University.
Hincks, R. (forthcoming). Measures and perceptions of liveliness in student oral presentation speech: a proposal for an automatic
feedback mechanism. Accepted for publication in System.
Kormos, J. and M. Dénes (2004). Exploring
measures and perceptions of fluency in the
speech of second language learners. System
32: 145-164,
Mennen, I. (1998). Can language learners ever
acquire the intonation of a second language?
Proceedings of STiLL 98, Marholmen, Sweden, KTH Department of Speech, Music and
Hearing.
Pickering, L. (2004). The structure and function
of intonational paragraphs in native and nonnative speaker instructional discourse. English for Specific Purposes 23: 19-43,
Sjölander, K. and J. Beskow (2000). WaveSurfer: An open source speech tool. Proceedings of ICSLP 2000,
http://www.speech.kth.se/snack/.
Acknowledgements
My thanks to David House, the student
speakers and especially to teacher Beyza Björkman, whose encouragement was important in
getting five volunteers for this study. This work
was funded by the Unit for Language and
Communication.
48