Proceedings, FONETIK 2005, Department of Linguistics, Göteborg University Presenting in English and Swedish Rebecca Hincks Department of Speech, Music and Hearing, KTH, Stockholm Unit for Language and Communication, KTH, Stockholm requirements. The aim of the small study described in this paper was to gather data to shed light on the question of how individual speakers might differ in speaking characteristics when presenting in a first or second language. Other research has suggested that a narrowed pitch range is a characteristic of second language speech (Mennen 1998; Pickering 2004), at the same time as it has been shown that using pitch effectively is an important means of structuring instructional discourse. In situations such as exist in Sweden, where students are increasingly judged on tasks performed in a second language, it is of interest to know the extent to which that requirement constrains them. This paper investigates pitch variation levels and speaking rates in the English and Swedish versions of the same presentations. If speakers were found to use less pitch variation when speaking English than Swedish, then second language users could be seen as primary users of a system for encouraging more pitch variation. It was expected that speaking rates would be faster for Swedish than for English; this examination could quantify the differences. Abstract This paper reports on a comparison of prosodic variables from oral presentations in a first and second language. Five Swedish natives who speak English at the advanced-intermediate level were recorded as they made the same presentation twice, once in English and once in Swedish. Though it was expected that speakers would use more pitch variation when they spoke Swedish, three of the five speakers showed no significant difference between the two languages. All speakers spoke more quickly in Swedish, the mean being 20% faster. Introduction Two earlier contributions to the Annual Swedish Phonetics Conference have outlined ideas for a feedback mechanism for public speaking. Briefly, Hincks 2003 proposed that speech technology be used to support the practice of oral presentations. Speech recognition could give feedback on repeated segmental errors produced by non-natives as well as provide a transcript of the presentation, which could then be processed for lexical and syntactic appropriateness. Speech analysis could give feedback on the speaker’s prosodic variability and speaking rate. Hincks 2004 presented an analysis of pitch variation in a corpus of second language student presentation speech. Pitch variation was measured as the standard deviation of F0 for 10-second long segments of speech, normalized by dividing by the mean F0 for that segment. This value was termed PVQ, for pitch variation quotient. Hincks (forthcoming) reports on the results of a perception test of speaker liveliness, where a strong correlation (r = .83, n = 18, p < .01) was found between speaker PVQ and perceptions of liveliness from a panel of eight judges. Though automatic feedback on the prosody of public speaking could be useful for both first and second language users, the abovementioned studies have been done on a corpus of L2 English, where native Swedish students of Technical English were recorded as they made oral presentations as part of their course Method The goal for the data collection used for this paper was to have a corpus where the same speaker used both English and Swedish to make the same presentation, with the same visual material. Because class time could not be wasted with having students hear the same presentation twice, the Swedish recordings needed to be made outside the classroom. All students studying English at KTH in the fall of 2004 –nearly 100 students—were contacted and asked whether they would like to participate. They were told that they would first be recorded in the classroom as they made their presentations in English, and that they would then meet in groups and make the same presentations in Swedish to each other. They were offered 150 SEK as compensation for the extra time it would take. Unfortunately, only five students were able to participate. These five, three males and two females, were all intermediate students. They were first recorded in their 45 Proceedings, FONETIK 2005, Department of Linguistics, Göteborg University English classroom, and then met at the end of term to be recorded in Swedish. The audience for the second recording consisted of the other four students, their English teacher, and me. Four of the five students used computer-based visual support for their presentations, and were instructed to use their English slides for the Swedish presentation. This assured that the content of the presentations would be the same. One student, M3, did not use extensive visual support. With WaveSurfer’s (Sjölander and Beskow 2000) ESPS pitch extraction boundaries set at 65-325 Hz for male speakers and 110-550 Hz for female speakers, pitch extraction was performed for up to 10 minutes of speech for the five presentations in each language. All pitch contours were visually inspected for evidence of extraction errors and the location of the errors noted. The F0 values were exported to a spreadsheet program, where the erroneous values were deleted, and the means and standard deviations of 10-second long segments were calculated. The standard deviation of each segment was divided by the mean of each segment to determine the PVQ, pitch variation quotient. Speaking rate was calculated by manually dividing the transcripts of the presentations into syllables and dividing by the total time spent speaking. Because pause time is included in the calculation, the values achieved are lower than what might otherwise be found in studies of spontaneous speech. Another temporal value of interest is the mean length of runs, which is the amount of speech, in syllables, a speaker utters between pauses. This measure has been found to correlate highly with language proficiency (Kormos and Dénes 2004). The minimum pause length was defined as 250 ms. Pitch variation quotient 0.26 0.24 English 0.22 Swedish 0.20 0.18 0.16 0.14 0.12 0.10 0.08 0.06 M1 M2 M3 F1 F2 Speaker Figure 1. Mean pitch variation quotient for whole presentation in both English and Swedish Temporal measures The male speakers spoke for a shorter length of time when making the presentation in Swedish than when using English, as shown in Figure 2. 700 English 600 Swedish Seconds 500 400 300 200 100 0 M1 M2 M3 F1 F2 Speaker Figure 2. Length of time in seconds to make presentation in English and Swedish Speaking rate Part of the reason the speakers could make their presentations in a shorter period of time is that they spoke on average 20% more quickly. Figure 3 shows the speaking rate per speaker in syllables per second. The mean speaking rate in English was 2.97 sps, and for Swedish was 3.58 sps. M3, the only student to use a lot more pitch variation in Swedish than in English, also spoke much more quickly in Swedish. Note also that the two females are more stable in their speaking rates, and that the fastest and slowest speakers in one language maintain their ranking in the other language. Results Pitch variation quotients The mean PVQs per speaker for the two presentations are shown in Figure 1. For three of the five speakers, there was very little difference in the PVQs when using English and when using Swedish. Only one speaker, M3, had significantly lower PVQ speaking English, but another, F1, had lower PVQ when speaking Swedish. Though there are only five speakers, the mean values reflect the same range as that found in the all-English corpus, with a low of about 0.11 and a high of about 0.24. 46 Proceedings, FONETIK 2005, Department of Linguistics, Göteborg University the larger, all-English corpus, where an attempt was made to gather data from every student in a class. It is reassuring, however, that the ranges of prosodic variables for these five speakers reflect nearly the same ranges as that of the first corpus. 5.0 Syllables per second 4.5 English 4.0 Swedish 3.5 3.0 2.5 2.0 Language or performance? The result that three of five speakers showed no significant difference in PVQ depending on the language they were using would seem to indicate that PVQ measures are more speaker dependent than language dependent, at least for native speakers of Swedish. The hypothesis that the speakers would use less pitch variation when speaking English was not at all born out by the study. It seems that the PVQ depends mostly on speaking style, and perhaps the energy one puts into ‘performing’ in a certain situation. The English presentation was a higher-stakes event, where students were speaking to more people and, most importantly, receiving a grade on their work. Speaker F1 performed very well for her first presentation, and with the high mean length of runs combined with higher-than average mean PVQ, probably would have received high liveliness ratings had her speech been part of the perception test. It is interesting that she was the only student to have lower PVQ values and the only student to have lower MLR values in Swedish than in English. This could indicate that she in some way put less effort into performance for the Swedish presentation. Speaker M3, on the other hand, was either hampered by using English or relatively unprepared when making the first presentation. He could have benefited by rehearsing with a feedback mechanism beforehand. For the purposes of a thesis grounded in computer-assisted language learning, these results throw a bit of a wrench in the works. The problems I am proposing to help may not depend on the use of a second language, but on more basic features of speaking style. On the other hand, at advanced levels of language courses, it is difficult to separate the needs of first and second language users. Furthermore, many native speakers as well as non-natives obviously have problems achieving an engaging speaking style, and it has never been my intention to propose a device restricted to nonnative use. 1.5 1.0 0.5 0.0 M1 M2 M3 F1 F2 Speaker Figure 3. Speaking rate in syllables per second for three males and two females in English and Swedish Mean length of runs A variable found to be important in the perception of liveliness in female speech samples (Hincks forthcoming) is the number of syllables between pauses of >250 ms (MLR). Four of the five speakers had higher values for this measure when speaking Swedish (Figure 4). The exception was F1, the same speaker who used less pitch variation in Swedish. 16 English 14 Mean length of runs Swedish 12 10 8 6 4 2 0 M1 M2 M3 F1 F2 Speaker Figure 4. Mean length of runs (number of syllables between >250 ms pauses) using English and Swedish Discussion This study was performed on a small group of speakers, and any results should be interpreted with care. The students who participated were paid volunteers, and in that sense cannot be considered as representative of the population to the same extent as the speakers recorded for 47 Proceedings, FONETIK 2005, Department of Linguistics, Göteborg University Further work A small study is being planned to test the perception of liveliness in these speakers as they used the two languages. The corpus described in this chapter could be augmented by a small number of speakers over the period of several terms and could provide a wealth of further opportunities for language study. Comparison of the English and Swedish transcripts will allow examination of aspects such as how the speakers use pitch movement in utterances that are comparable content-wise. This could provide insight into transfer of Swedish intonational patterns to English. It is possible that with more speakers, statistically significant differences in PVQ could still be found. The differences in mean speaking rate should also be further investigated—the 20% difference found in this group would be interesting to pursue. Does the average Swedish speaker of English manage to say only 80% of what a native speaker can say during the allotted time at a conference? Documenting such information about first and second language use would give valuable evidence for those in positions of developing language policy. References Hincks, R. (2003). Tutors, tools and assistants for the L2 user. Phonum 9: 173-176, Umeå University Department of Philosophy and Linguistics. Hincks, R. (2004). Standard deviation of F0 in student monologue. Proceedings of Fonetik 2004, Stockholm, Department of Linguistics, Stockholm University. Hincks, R. (forthcoming). Measures and perceptions of liveliness in student oral presentation speech: a proposal for an automatic feedback mechanism. Accepted for publication in System. Kormos, J. and M. Dénes (2004). Exploring measures and perceptions of fluency in the speech of second language learners. System 32: 145-164, Mennen, I. (1998). Can language learners ever acquire the intonation of a second language? Proceedings of STiLL 98, Marholmen, Sweden, KTH Department of Speech, Music and Hearing. Pickering, L. (2004). The structure and function of intonational paragraphs in native and nonnative speaker instructional discourse. English for Specific Purposes 23: 19-43, Sjölander, K. and J. Beskow (2000). WaveSurfer: An open source speech tool. Proceedings of ICSLP 2000, http://www.speech.kth.se/snack/. Acknowledgements My thanks to David House, the student speakers and especially to teacher Beyza Björkman, whose encouragement was important in getting five volunteers for this study. This work was funded by the Unit for Language and Communication. 48
© Copyright 2026 Paperzz