Faculty of Translation And Interpreting Course Syllabus 2015-2016 Academic Year Speech Technologies (23204) Degree programme: Degree in Translation and Interpreting Year in the programme: Fourth Term: Third Number of ECTS credits: 4 Hours of student dedication to the course: 100 Course type: Optional Plenary session teacher: Juan María Garrido Almiñana Language of instruction: Catalan 1. Course presentation The course is conceived as an introduction to speech technologies and to the linguistic work which involves their development, especially in professional environments. The ultimate goal is that students master the basic concepts related to these technologies and acquire a basic practice in the management of the speech processing tools used in this field. 2. Competences to be attained Learning Results Competences To know the main computer tools used in speech processing To carry out the proposed practical activities To have a general view of the interdisciplinary process of developing the main speech technologies G.6. Computer skills ● G.13. Ability to work individually and as a team ● G.14. Ability to work in international and interdisciplinary contexts ● G.17. Application of knowledge into practice ● To apply linguistic knowledge to the development of speech technologies G.20. Ability for autonomous and continuous learning ● E.5. Expertise in one or more subject areas ● To learn autonomously with the realisation of the proposed practical activities To master the basic concepts related to speech technologies (speech coding, synthesis and recognition) To acquire the basic skills for the computational processing of speech (AD conversion, storing, speech coding) To extract linguistic conclusions from speech processing operations ● E.9. Ability to think about language processes ● 3. Course contents 1. Introduction: speech technologies Speech technologies: text-to-speech, speech recognition, dialogue systems. The development of speech technologies in professional environments. The relationship between language and speech technologies. 2. Basic concepts 2.1. Speech signals Acoustic model of speech production: source and filter. Basic characterization of speech signals: time, amplitude and frequency; periodic and aperiodic signals. Spectral composition. Basic methods of representation of speech signals. The identification of speech sounds: vowels and consonants; place and manner of articulation; sonority. Acoustic correlates of prosody: fundamental frequency (F0), duration, pauses and amplitude. 2.2.Digital processing of speech Analog and digital signals. The digital-analog conversion (A / D). Concept of sampling. Sampling frequency. Resolution of an A / D. Saturation. Coding. 3. Text-to-speech systems 3.1. What is a text-to-speech system Concept of text-to-speech. Structure of a text-to-speech system: linguistic processing and synthesis. Major commercial systems. Multilingual systems. Applications. Linguistic processing to convert text to speech: preprocessing, linguistic analysis, prosodic segmentation, phonetic transcription, stress prediction. The synthesis process in commercial systems: unit concatenation synthesis. Phases: prediction of prosody, selection of units for synthesis. Signal modification. Speakers for the synthesis: speech databases and prosodic models. The process of creating synthetic speakers. 3.2. Developing text-to-speech systems Including a new language in a text-to-speech system. The development of a language processing module. The creation of synthetic speakers. Evaluation. 4. Speech recognition 4.1. What is a speech recognition system Concept of speech recognition. Structure of a speech recognition system: parameterization, acoustic recognition and linguistic post-processing. Major commercial systems. Multilingual recognizers. Applications. Parameterization. The recognition process in commercial systems: Markov models. The process of developing acoustic models. Acoustic post-processing: phonetic dictionaries. Linguistic post-processing: language models and state grammars. 4.2. Developing speech recognition systems Including a new language in a speech recognition system: corpus collection, creating acoustic models, dictionary creation, creation of language models. 4. Dialogue systems 5.1. What is a dialogue system Structure of a dialogue system: speech recognition; speech understanding; dialogue management; generation of response messages; text-to-speech. Commercial systems. Main applications. Speech understanding: semantic analysis. Dialogue management. The generation of the response message: text generation. 5.2. Developing applications using dialogue systems The process of creating a commercial dialogue system. Linguistic tasks. 4. Evaluation and reassessment The final grade will be the result of the marks obtained at: ● Two evaluated activities, to be solved individually, one in the middle and one at the end of the course (20% of the final grade each) ● A theoretical/practical exam at the end of the course (60% of the grade) It will also be a prerequisite to be graded to submit all evaluable activities proposed in the practical sessions. Reassessment Evaluation Evaluation activities (competences evaluated) Graded activity 1 Percentage of the final mark (G.6, G.13, G.14, G.17, G.20, E.5, E.9) How is it made up? It can be made up 20% Realisation of a new activity of the same type 20% It can be made up 20% Realisation of a new activity of the same type 60% It can be made up 60% Realisation of a practical work (G.6, G.13, G.14, G.17, G.20, E.5, E.9) Exam Percentage of the final mark 20% (G.6, G.13, G.14, G.17, G.20, E.5, E.9) Graded activity 2 It can / cannot be made up Requisites and observations 5. Methodology: training activities The course is structured around two axes: ● ● ‘large group’ classes (15 hours), in which different theoretical concepts will be introduced; seminar sessions (10 hours), along which students will perform, individually or in groups and with the support of the teacher, a series of activities to reinforce the theoretical concepts worked in the theoretical sessions and to achieve the practical competences established for the course. These activities will not be evaluated, but they will be submitted as a mandatory prerequisite to be graded. Students must complete their dedication to the subject with a number of hours of work outside of these sessions (approximately 55) to complete non-grades practical activities, and the graded activities. 6. Basic course bibliography Basic ● FURUI, S. (2001).- Digital Speech Processing, Synthesis and Recognition (Second Edition, Revised and Expanded) New York, Marcel Dekker, Inc. ● LÓPEZ-CÓZAR , R. - ARAKI , M. (2005).- Spoken, Multilingual and Multimodal Dialogue Systems: Development and Assessment , Chichester, John Wiley & Sons. ● O´SHAUGHNESSY, D. (1987).- Speech Communication. Human and Machine . Addison Wesley Series in Electrical Engineering, 2nd edition, 2000. ● SCHROEDER, M. Springer-Verlag. ● TAYLOR , P. (2009).- Text-To-Speech synthesis , Cambridge, Cambridge University Press. R. (1999).- Computer Speech. Recognition, Compression, Synthesis , Complementary ● GIACHIN, E.- McGLASHAN, S. (1997) "Spoken Language Systems", in YOUNG, S.BLOOTHOOFT, G. (Eds.) Corpus-Based Methods in Language and Speech Processing . Dordrecht: Kluwer Academic Publishers (Text, Speech and Language Technology, 2) pp. 69-117. ● LADEFOGED, P. (2003).- Phonetic Data Analysis. An Introduction to Fieldwork and Instrumental Techniques , Malden, Blackwell. ● MARTÍNEZ-CELDRÁN, E. (1998).-. Análisis espectrográfico de los sonidos del habla . Barcelona: Ariel. ● PRIETO, P. (2004).- Fonètica i fonologia catalanes . Els sons del català , Barcelona: Edicions de la UOC. ● QUILIS, A. (1981).- Fonética acústica de la lengua española . Madrid, Gredos (Biblioteca Románica Hispánica, Manuales, 49)
© Copyright 2026 Paperzz