Is Text-to-Speech Synthesis Ready for use in CALL? Zöe Handley Learning Sciences Research Institute (LSRI), University of Nottingham and Marie-Josée Hamel Department of French, Dalhousie University CALL 2008, Antwerp, Belgium 1 Plan TTS synthesis in CALL Evaluation Requirements analysis Readiness of TTS synthesis for CALL Conclusions 2 TTS synthesis What is TTS synthesis? – Speech synthesis “systems … allow the generation of novel messages, either from scratch (i.e. entirely by rule) or by re-combining shorter prestored units” (van Bezooijen and van Heuven, 1997: 709) – Text-to-Speech Synthesis systems allow the automatic generation of speech from text Why use TTS in CALL? – There is a general need in language learning and teaching for “self-paced interactive learning environments” which provide “controlled interactive speaking practice outside the classroom” (Ehsani and Knodt, 1998: 45). http://www.acapela-group.com/textto-speech-interactive-demo.html Graham Lucy 3 CALL Applications Reading machine – Pronunciation model – Auditory discrimination; repetition (Hamel, 1998; Mercier et al., 2000) Conversational partner – Oxford Hachette 4 French Dictionary on CD-ROM Talking dictionaries, texts (de Pijper, 1997; Hamel, 2003), word processors, and conjugators, dictations (Santiago-Oriola, 1999; Mercier et al., 1999), and graphemephoneme exercises In combination with automatic speech recognition, speech understanding, the generative power of TTS synthesis can be harnessed to provide learners with interactive speaking practice, i.e. a dialogue partner (Raux and Eskenazi, 2004; Senef et al., 2004) 4 Benefits of TTS synthesis Improvements on other media – Easy creation and editing of speech samples – Simultaneous presentation of text and speech – Low storage requirements – Non-human and therefore perceived as nonjudgemental Adds value – Generation of examples on demand (Sherwood, 1981) and therefore the automatic generation of feedback, conversational turns, and exercises with speech models 5 Why evaluation? Few CALL applications integrating TTS synthesis are available on the market Few evaluations of TTS synthesis for the purposes of CALL have been conducted Since the failure of the language laboratory teachers have been sceptical about unevaluated technologies TTS synthesis is being used in CALL in roles in which it has not been used in previous applications outside CALL - the most common, perhaps only, role that TTS synthesis assumes outside CALL is that of a reading machine 6 Framework for the evaluation of TTS synthesis for use in CALL (Handley and Hamel, 2005) 1. Basic research evaluation of TTS synthesis for use in CALL – 2. Technology evaluation of TTS synthesis for use in CALL – 3. Potential of the planned activity to provide ideal conditions for SLA Usage evaluation of the teacher-planned activity – Potential of the CALL program to provide ideal conditions for SLA Judgemental evaluation of the teacher-planned activity – 5. Adequacy of TTS synthesis for use in CALL Judgemental evaluation of the CALL application – 4. Viability and potential benefits of the use of TTS synthesis in CALL Learner’s performance in the planned activity This is a combination of the levels of evaluation recommended by Chapelle (2001) for the evaluation of CALL activities and by ELSE (1999) for the evaluation of Speech and Language Technologies (SALT). 7 Evaluations of TTS Synthesis for CALL Technology evaluations of TTS synthesis for use in CALL • Stratil et al (1987) – Evaluated the quality of a Spanish TTS chip for use for the presentation of grammar exercises in a language laboratory. Usage evaluation of the teacher-planned activity – Outcome-oriented • Santiao-Oriola (1999) – Evaluated the use of a French TTS synthesiser for the presentation of dictation exercises. • Hincks (2002) – Evaluated the use of a Swedish TTS synthesiser in combination with a speech editor (resynthesis) for teaching the lexical stress of English to Swedophones. – Process-oriented • Cohen (1993) – Evaluated the use of a talking word processor to support literacy activities, namely writing stories, for young learners of French as a second language. 8 Requirements analysis The evaluation process – ISO (1999) and EAGLES (1999) guidelines – Establish the evaluation requirements • Establish the purpose of the evaluation • Identify the types of products to be evaluated • Specify the quality model CALL requirements “When the language competence of the system begins to outstrip that of some of the better second language users, such systems become useful adjunct tools” (Keller and ZellnerKeller, 2000) – Specify the evaluation • Select metrics • Establish rating levels for metrics • Establish criteria for assessment – Design the evaluation – Execute the evaluation 9 CALL requirements analysis Ideal conditions for Second Language Acquisition (SLA) (Chapelle, 2001) – Language learning potential • Goals of SLA – – – – – Communicative competence Quality of the output Primary requirement: Comprehensibility/intelligibility Secondary requirements: Accuracy and naturalness At both the level of individual speech sounds and the prosodic level • Focus on form – Flexibility – Speech rate, pitch 10 Explorative investigation (Handley and Hamel, 2005) Research questions 1. 2. Method – – – – – Do the different roles identified impose different requirements on the quality of speech synthesis? Does comprehensibility account for acceptability for use in CALL? 17 French teachers One research TTS system, FIPSvox from the University of Geneva 3 roles: (1) reading machine, (2) pronunciation model, and (3) conversational partner Likert scales: (1) comprehensibility, (2) acceptability, and (3) appropriateness Word pointing paradigm (van Santen, 1993) Results 1. 2. Most suitable as a dialogue partner. Least suitable as a pronunciation model. Comprehensibility is not the only requirement. Accuracy and naturalness matter as do register and expressiveness. 11 Is TTS synthesis ready for use in CALL? Research questions – Do the different roles identified impose different requirements on the quality of speech synthesis? – Is TTS synthesis ready for use in CALL? Design – Within subjects, N = 17, French Teachers – Dependent variables – Quality of the speech output – Acceptability – Adequacy – Independent variables – Role of TTS in CALL: (1) Reading Machine (RM), (2) Pronunciation Model (PM) at the (a) segmental level and (b) suprasegmental level, and (3) Conversational Partner (CP) – TTS synthesis system 12 Systems evaluated 1. http://www.research.att.com/~ttsweb/tts/demo.php#top 2. French English http://www.multitel.be/TTS/layout.php?page=eLite_demo 4. English http://212.8.184.250/tts/demo_login.jsp 3. French French English http://www.acapela-group.com/text-to-speech-interactivedemo.html French English 13 Questionnaire MOS-CALL – ITU-T Overall Quality Test – MOS-X (Polkosky and Lewis, 2003) On-line presentation of questionnaire 14 Is TTS synthesis ready for use in CALL? Mean ratings of adequacy Different TTS synthesis systems are most suitable for use in different roles Reinforces the need to evaluate every TTS synthesis system Mean ratings of acceptability System 4 is ready for use in all applications where TTS synthesis adds value 15 System 1: AT&T Next-Gen (Alain) Mean ratings of quality of output 16 System 2: Nuance Vocalizer (Julie) Mean ratings of quality of output 17 System 3: eLite (Vincent) Mean ratings of quality of output 18 Do the different roles have different requirements? Mean ratings of adequacy Differences in adequacy were statistically significant for systems 2 and 4 (χ²r = 8.010, df = 3, p = 0.046; χ²r = 8.063, df = 3, p = 0.045, respectively) But, not for systems 1 and 3 (χ²r = 2.352, df = 3, p = 0.503; χ²r = 3.467, df = 3, p = 0.325; χ²r = 3.194, respectively) Mean ratings of acceptability Differences in acceptability were not significant (system 1 χ²r = 6.616, df = 3, p = 0.085, system 2 χ²r = 6.303, df = 3, p = 0.098, system 3 χ²r = 3.194, df = 3, p = 0.363, and system 4 χ²r = 5.547, df = 3, p = 0.163) 19 Conclusions Some French TTS synthesis systems are reaching readiness for use in CALL in applications which add value In order to fully meet the requirements of CALL more attention needs to be paid to accuracy and naturalness, in particular at the prosodic level, and expressiveness – Expressive speech synthesis is the focus of much current research (Campbell et al., 2006) This may not be the case for all languages; different languages pose different problems to TTS It will not be long before learners will be able to benefit from the support of an untiring non-judgemental substitute native speaker 24/7 in CALL applications. 20
© Copyright 2026 Paperzz