Is Text-to-Speech Synthesis Ready for use in CALL?

Is Text-to-Speech Synthesis
Ready for use in CALL?
Zöe Handley
Learning Sciences Research Institute
(LSRI), University of Nottingham
and
Marie-Josée Hamel
Department of French, Dalhousie
University
CALL 2008, Antwerp, Belgium
1
Plan





TTS synthesis in CALL
Evaluation
Requirements analysis
Readiness of TTS synthesis for CALL
Conclusions
2
TTS synthesis


What is TTS synthesis?
–
Speech synthesis “systems … allow
the generation of novel messages,
either from scratch (i.e. entirely by
rule) or by re-combining shorter prestored units” (van Bezooijen and van
Heuven, 1997: 709)
–
Text-to-Speech Synthesis systems
allow the automatic generation of
speech from text
Why use TTS in CALL?
–
There is a general need in language
learning and teaching for “self-paced
interactive learning environments”
which provide “controlled interactive
speaking practice outside the
classroom” (Ehsani and Knodt,
1998: 45).
http://www.acapela-group.com/textto-speech-interactive-demo.html
Graham
Lucy
3
CALL Applications

Reading machine
–

Pronunciation model
–

Auditory discrimination; repetition
(Hamel, 1998; Mercier et al., 2000)
Conversational partner
–
Oxford Hachette 4 French
Dictionary on CD-ROM
Talking dictionaries, texts (de Pijper,
1997; Hamel, 2003), word
processors, and conjugators,
dictations (Santiago-Oriola, 1999;
Mercier et al., 1999), and graphemephoneme exercises
In combination with automatic
speech recognition, speech
understanding, the generative power
of TTS synthesis can be harnessed
to provide learners with interactive
speaking practice, i.e. a dialogue
partner (Raux and Eskenazi, 2004;
Senef et al., 2004)
4
Benefits of TTS synthesis

Improvements on other
media
– Easy creation and editing of
speech samples
– Simultaneous presentation
of text and speech
– Low storage requirements
– Non-human and therefore
perceived as nonjudgemental

Adds value
– Generation of examples
on demand (Sherwood,
1981) and therefore the
automatic generation of
feedback, conversational
turns, and exercises with
speech models
5
Why evaluation?

Few CALL applications integrating TTS synthesis are available
on the market

Few evaluations of TTS synthesis for the purposes of CALL
have been conducted

Since the failure of the language laboratory teachers have been
sceptical about unevaluated technologies

TTS synthesis is being used in CALL in roles in which it has not
been used in previous applications outside CALL - the most
common, perhaps only, role that TTS synthesis assumes
outside CALL is that of a reading machine
6
Framework for the evaluation of TTS synthesis
for use in CALL (Handley and Hamel, 2005)
1.
Basic research evaluation of TTS synthesis for use in CALL
–
2.
Technology evaluation of TTS synthesis for use in CALL
–
3.
Potential of the planned activity to provide ideal conditions for SLA
Usage evaluation of the teacher-planned activity
–

Potential of the CALL program to provide ideal conditions for SLA
Judgemental evaluation of the teacher-planned activity
–
5.
Adequacy of TTS synthesis for use in CALL
Judgemental evaluation of the CALL application
–
4.
Viability and potential benefits of the use of TTS synthesis in CALL
Learner’s performance in the planned activity
This is a combination of the levels of evaluation recommended by Chapelle
(2001) for the evaluation of CALL activities and by ELSE (1999) for the
evaluation of Speech and Language Technologies (SALT).
7
Evaluations of TTS Synthesis for
CALL

Technology evaluations of TTS synthesis for use in CALL
• Stratil et al (1987)
– Evaluated the quality of a Spanish TTS chip for use for the presentation of grammar
exercises in a language laboratory.

Usage evaluation of the teacher-planned activity
– Outcome-oriented
• Santiao-Oriola (1999)
– Evaluated the use of a French TTS synthesiser for the presentation of dictation
exercises.
• Hincks (2002)
– Evaluated the use of a Swedish TTS synthesiser in combination with a speech editor (resynthesis) for teaching the lexical stress of English to Swedophones.
– Process-oriented
• Cohen (1993)
– Evaluated the use of a talking word processor to support literacy activities, namely
writing stories, for young learners of French as a second language.
8
Requirements analysis

The evaluation process
– ISO (1999) and EAGLES
(1999) guidelines
– Establish the evaluation
requirements
• Establish the purpose of the
evaluation
• Identify the types of products to
be evaluated
• Specify the quality model

CALL requirements
“When the language competence of
the system begins to outstrip that of
some of the better second language
users, such systems become useful
adjunct tools” (Keller and ZellnerKeller, 2000)
– Specify the evaluation
• Select metrics
• Establish rating levels for
metrics
• Establish criteria for
assessment
– Design the evaluation
– Execute the evaluation
9
CALL requirements analysis

Ideal conditions for Second Language Acquisition (SLA)
(Chapelle, 2001)
– Language learning potential
• Goals of SLA
–
–
–
–
–
Communicative competence
Quality of the output
Primary requirement: Comprehensibility/intelligibility
Secondary requirements: Accuracy and naturalness
At both the level of individual speech sounds and the prosodic
level
• Focus on form
– Flexibility
– Speech rate, pitch
10
Explorative investigation (Handley and Hamel,
2005)

Research questions
1.
2.

Method
–
–
–
–
–

Do the different roles identified impose different requirements on the
quality of speech synthesis?
Does comprehensibility account for acceptability for use in CALL?
17 French teachers
One research TTS system, FIPSvox from the University of Geneva
3 roles: (1) reading machine, (2) pronunciation model, and (3)
conversational partner
Likert scales: (1) comprehensibility, (2) acceptability, and (3)
appropriateness
Word pointing paradigm (van Santen, 1993)
Results
1.
2.
Most suitable as a dialogue partner. Least suitable as a pronunciation
model.
Comprehensibility is not the only requirement. Accuracy and naturalness
matter as do register and expressiveness.
11
Is TTS synthesis ready for use in CALL?

Research questions
– Do the different roles identified impose different requirements on the quality
of speech synthesis?
– Is TTS synthesis ready for use in CALL?

Design
– Within subjects, N = 17, French Teachers
– Dependent variables
– Quality of the speech output
– Acceptability
– Adequacy
– Independent variables
– Role of TTS in CALL: (1) Reading Machine (RM), (2) Pronunciation
Model (PM) at the (a) segmental level and (b) suprasegmental level,
and (3) Conversational Partner (CP)
– TTS synthesis system
12
Systems evaluated
1.
http://www.research.att.com/~ttsweb/tts/demo.php#top

2.
French
English
http://www.multitel.be/TTS/layout.php?page=eLite_demo

4.
English
http://212.8.184.250/tts/demo_login.jsp

3.
French
French
English
http://www.acapela-group.com/text-to-speech-interactivedemo.html

French
English
13
Questionnaire

MOS-CALL
– ITU-T Overall Quality Test
– MOS-X (Polkosky and Lewis, 2003)
On-line presentation of
questionnaire
14
Is TTS synthesis ready for use in CALL?
Mean ratings of adequacy

Different TTS synthesis systems
are most suitable for use in
different roles
 Reinforces the need to evaluate
every TTS synthesis system
Mean ratings of acceptability

System 4 is ready for use in all
applications where TTS
synthesis adds value
15
System 1: AT&T Next-Gen (Alain)
Mean ratings of quality of output
16
System 2: Nuance Vocalizer (Julie)
Mean ratings of quality of output
17
System 3: eLite (Vincent)
Mean ratings of quality of output
18
Do the different roles have different
requirements?
Mean ratings of adequacy


Differences in adequacy were
statistically significant for systems 2
and 4 (χ²r = 8.010, df = 3, p = 0.046;
χ²r = 8.063, df = 3, p = 0.045,
respectively)
But, not for systems 1 and 3 (χ²r =
2.352, df = 3, p = 0.503; χ²r =
3.467, df = 3, p = 0.325; χ²r = 3.194,
respectively)
Mean ratings of acceptability

Differences in acceptability were not
significant (system 1 χ²r = 6.616, df
= 3, p = 0.085, system 2 χ²r =
6.303, df = 3, p = 0.098, system 3
χ²r = 3.194, df = 3, p = 0.363, and
system 4 χ²r = 5.547, df = 3, p =
0.163)
19
Conclusions

Some French TTS synthesis systems are reaching readiness for
use in CALL in applications which add value

In order to fully meet the requirements of CALL more attention
needs to be paid to accuracy and naturalness, in particular at
the prosodic level, and expressiveness
– Expressive speech synthesis is the focus of much current research
(Campbell et al., 2006)

This may not be the case for all languages; different languages
pose different problems to TTS

It will not be long before learners will be able to benefit from
the support of an untiring non-judgemental substitute
native speaker 24/7 in CALL applications.
20

Download Report

Is Text-to-Speech Synthesis Ready for use in CALL?

Paperzz.com

Your Paperzz