23204

Faculty of Translation And Interpreting
Course Syllabus
2015-2016 Academic Year
Speech Technologies (23204) Degree programme: ​
Degree in Translation and Interpreting
Year in the programme: ​
Fourth
Term: ​
Third
Number of ECTS credits: ​
4
Hours of student dedication to the course: ​
100
Course type: ​
Optional
Plenary session teacher: ​
Juan María Garrido Almiñana
Language of instruction:​
Catalan
1. Course presentation
The course is conceived as an introduction to speech technologies and to the linguistic work which
involves their development, especially in professional environments. The ultimate goal is that
students master the basic concepts related to these technologies and acquire a basic practice in the
management of the speech processing tools used in this field.
2. Competences to be attained
Learning Results
Competences To know the main computer tools used
in speech processing
To carry out the proposed practical
activities
To have a general view of the
interdisciplinary process of developing
the main speech technologies
G.6. Computer skills
●
G.13. Ability to work individually and as a team
●
G.14. Ability to work in international and
interdisciplinary contexts
●
G.17. Application of knowledge into practice
●
To apply linguistic knowledge to the
development of speech technologies
G.20. Ability for autonomous and continuous
learning
●
E.5. Expertise in one or more subject areas
●
To learn autonomously with the
realisation of the proposed practical
activities
To master the basic concepts related to
speech technologies (speech coding,
synthesis and recognition)
To acquire the basic skills for the
computational processing of speech (AD
conversion, storing, speech coding)
To extract linguistic conclusions from
speech processing operations
●
E.9. Ability to think about language processes
●
3. Course contents
1. Introduction: speech technologies
Speech technologies: text-to-speech, speech recognition, dialogue systems. The development of
speech technologies in professional environments. The relationship between language and
speech technologies.
2. Basic concepts
2.1. Speech signals
Acoustic model of speech production: source and filter. Basic characterization of speech signals:
time, amplitude and frequency; periodic and aperiodic signals. Spectral composition. Basic
methods of representation of speech signals. The identification of speech sounds: vowels and
consonants; place and manner of articulation; sonority. Acoustic correlates of prosody:
fundamental frequency (F0), duration, pauses and amplitude.
2.2.Digital processing of speech
Analog and digital signals. The digital-analog conversion (A / D). Concept of sampling. Sampling
frequency. Resolution of an A / D. Saturation. Coding.
3. Text-to-speech systems
3.1. What is a text-to-speech system
Concept of text-to-speech. Structure of a text-to-speech system: linguistic processing and
synthesis. Major commercial systems. Multilingual systems. Applications. Linguistic processing to
convert text to speech: preprocessing, linguistic analysis, prosodic segmentation, phonetic
transcription, stress prediction. The synthesis process in commercial systems: unit concatenation
synthesis. Phases: prediction of prosody, selection of units for synthesis. Signal modification.
Speakers for the synthesis: speech databases and prosodic models. The process of creating
synthetic speakers.
3.2.
Developing text-to-speech systems
Including a new language in a text-to-speech system. The development of a language processing
module. The creation of synthetic speakers. Evaluation.
4. Speech recognition
4.1. What is a speech recognition system
Concept of speech recognition. Structure of a speech recognition system: parameterization,
acoustic recognition and linguistic post-processing. Major commercial systems. Multilingual
recognizers. Applications. Parameterization. The recognition process in commercial systems:
Markov models. The process of developing acoustic models. Acoustic post-processing: phonetic
dictionaries. Linguistic post-processing: language models and state grammars.
4.2. Developing speech recognition systems
Including a new language in a speech recognition system: corpus collection, creating acoustic
models, dictionary creation, creation of language models.
4. Dialogue systems
5.1. What is a dialogue system
Structure of a dialogue system: speech recognition; speech understanding; dialogue
management; generation of response messages; text-to-speech. Commercial systems. Main
applications. Speech understanding: semantic analysis. Dialogue management. The generation
of the response message: text generation.
5.2. Developing applications using dialogue systems
The process of creating a commercial dialogue system. Linguistic tasks.
4. Evaluation and reassessment
The final grade will be the result of the marks obtained at:
●
Two evaluated activities, to be solved individually, one in the middle and one at the end of
the course (20% of the final grade each) ●
A theoretical/practical exam at the end of the course (60% of the grade) It will also be a prerequisite to be graded to submit all evaluable activities proposed in the practical
sessions. Reassessment
Evaluation
Evaluation
activities
(competences
evaluated)
Graded activity 1
Percentage
of the final
mark
(G.6, G.13, G.14,
G.17, G.20, E.5,
E.9)
How is it
made up?
It can be
made up
20%
Realisation of
a new activity
of the same
type
20%
It can be
made up
20%
Realisation of
a new activity
of the same
type
60%
It can be
made up
60%
Realisation of
a practical
work
(G.6, G.13, G.14,
G.17, G.20, E.5,
E.9)
Exam
Percentage
of the final
mark
20%
(G.6, G.13, G.14,
G.17, G.20, E.5,
E.9)
Graded activity 2
It can /
cannot
be made
up
Requisites
and
observations
5. Methodology: training activities
The course is structured around two axes:
●
●
‘large group’ classes (15 hours), in which different theoretical concepts will be introduced;
seminar sessions (10 hours), along which students will perform, individually or in groups and
with the support of the teacher, a series of activities to reinforce the theoretical concepts
worked in the theoretical sessions and to achieve the practical competences established for
the course. These activities will not be evaluated, but they will be submitted as a mandatory
prerequisite to be graded.
Students must complete their dedication to the subject with a number of hours of work outside of
these sessions (approximately 55) to complete non-grades practical activities, and the graded
activities.
6. Basic course bibliography
Basic ●
FURUI, S. (2001).- ​
Digital Speech Processing, Synthesis and Recognition (Second Edition,
Revised and Expanded)​
New York, Marcel Dekker, Inc.
●
LÓPEZ-CÓZAR​
, R. - ​
ARAKI​
, M. (2005).- ​
Spoken, Multilingual and Multimodal Dialogue Systems:
Development and Assessment​
, Chichester, John Wiley & Sons.
●
O´SHAUGHNESSY, D. (1987).- ​
Speech Communication. Human and Machine​
. Addison Wesley
Series in Electrical Engineering, 2nd edition, 2000.
●
SCHROEDER, M.
Springer-Verlag.
●
TAYLOR​
, P. (2009).- ​
Text-To-Speech synthesis​
, Cambridge, Cambridge University Press.
R.
(1999).-
Computer
​
Speech.
Recognition,
Compression,
Synthesis​
,
Complementary ●
GIACHIN, E.- McGLASHAN, S. (1997) "Spoken Language Systems", in YOUNG, S.BLOOTHOOFT, G. (Eds.) ​
Corpus-Based Methods in Language and Speech Processing​
. Dordrecht:
Kluwer Academic Publishers (Text, Speech and Language Technology, 2) pp. 69-117. ●
LADEFOGED, P. (2003).- ​
Phonetic Data Analysis. An Introduction to Fieldwork and Instrumental
Techniques​
, Malden, Blackwell. ●
MARTÍNEZ-CELDRÁN, E. (1998).-. ​
Análisis espectrográfico de los sonidos del habla​
. Barcelona:
Ariel. ●
PRIETO, P. (2004).- ​
Fonètica i fonologia catalanes​
. ​
Els sons del català​
, Barcelona: Edicions de la
UOC.
●
QUILIS, A. (1981).- ​
Fonética acústica de la lengua española​
. Madrid, Gredos (Biblioteca
Románica Hispánica, Manuales, 49)