Spoken language technologies

Centro per la Ricerca Scientifica e Tecnologica
Spoken language technologies: recent advances and
future challenges
Gianni Lazzari
VIENNA July 26
Centro per la Ricerca Scientifica e Tecnologica
SUMMARY




Short introduction on SLT
Where are we today ?
TC-STAR and RAI projects
Outlook for the future
Focus on the use of Spoken Language
Technologies for multilingual transcription
and reporting tasks
Typical tasks in Human
Language Technologies
(HLT)
 speech recognition (voice commands & speech
transcription)








character recognition
object and gesture recognition
(spoken and written) language understanding
spoken dialog systems
speech synthesis
text summarization
document classification and information retrieval
syntactic analysis of natural language
 speech and text translation
 • ...
Spoken Language Technologies: recent
advances and future challenges
3
General Spoken Language
System Architecture
Recognition
input
acoustic
Understanding and
dialog
answer
MODELS
language
semantic
dialog
Generation and
Synthesis
Spoken Language Technologies: recent
advances and future challenges
synthesis
4
Speech Transcription System
Architecture
Recognition
Input
MODELS
Acoustic
Audio:
-Noise
-Speech
results:
Language
Enriched Text
Speakers
-Music
Speech
Music
Noise
-…..
Spoken Language Technologies: recent
advances and future challenges
5
Typical Transcription System
Spoken Language Technologies: recent
advances and future challenges
6
Standard Automatic Speech
Recognition Architecture
Spoken Language Technologies: recent
advances and future challenges
7
Word error rate of different speech
recognition tasks
Dictation:
7%,
Broadcast news:
12%,
Switchboard :
20-30%
Voicemail:
30%
Meetings:
50-60%
well formed, computer,
various, audience,
spontaneous, person,
spontaneous, person,
spontaneous, person
FBW
FBW
TBW
TWB
FF
The features characterizing these tasks are:
 type of speech: well formed vs spontaneous
 target of communication: computer, audience, person
 bandwidth:
 FWB, full bandwidth
 TWB, telephone bandwidth
 FF, far field.
Spoken Language Technologies: recent
advances and future challenges
8
RAI Italian Broadcast news
Transcription
Spoken Language Technologies: recent
advances and future challenges
9
Evaluation of the Italian broadcast
news transcription task.
 Acoustic models are trained through a speaker adaptive
acoustic modelling procedures
 Two sets of acoustic models were trained, for wideband
and narrowband speech: exploiting for each set about
140 hours of speech.
 The LM was estimated on a 226M-word corpus
including newspaper articles, for the largest part, and
BN transcripts.
 The LM is compiled into a static network with a sharedtail topology..
Spoken Language Technologies: recent
advances and future challenges
10
Word error rate on the Italian
broadcast news transcription task.
Wideband
Narrowband
Overall
First
Pass
Second
Pass
First
Pass
Second
Pass
First
Pass
Second
Pass
Old
15.5
14.2
25.2
22.4
17.6
16.0
New
14.6
11.7
21.0
17.1
16.0
12.9
Relative
reduction
5.8%
17.6%
16.7%
23.7%
9.1%
19.4%
Spoken Language Technologies: recent
advances and future challenges
11
STATISTICAL TRANSLATION BASED
ON BAYESIAN DECISION RULE
Speech recognition
Transformation
Source language text
Vorrei prenotare
un albergo a
Francoforte
Lexicon model
Global
Search
Alignment model
Language model
Transformation
Speech synthesis
target language text
I want to reserve a hotel
room in Frankfurt
Spoken Language Technologies: recent
advances and future challenges
12
Statistical Translation System
Spoken Language Technologies: recent
advances and future challenges
13
Experimental findings in HLT
research (1973-2004)
 statistical methods most successful:

in particular: speech recognition, language translation, parsing, dialog
systems, ...
 scientific foundations:

methods of computer science, statistical modelling, information theory
 handling huge amounts of data

200 hours of speech recordings, 100 Mio of running words, ...
 learning from data:


fully automatic procedures
more data than can be processed by human experts
 efficient algorithms:

search/decision algorithms for heuristic search
 • ...
Spoken Language Technologies: recent
advances and future challenges
16
Research on HLT: 1973-2004
 speech recognition (1973-2004)



most of the progress: by pure statistical modelling
some progress: by weak acoustic-phonetic-linguistic
knowledge,i.e. domain specific knowledge
virtually no progress: by classical rule-based and AI methods
 similar recent experience (1993-2004)

machine translation, information extraction, dialog systems, ...
 expectation for future progress in HLT


most important: methodology:
computer science, statistical modelling, information theory
domain-specific knowledge:
acoustics, phonetics, linguistics, ...
Spoken Language Technologies: recent
advances and future challenges
17
Spoken language translation: joint projects (national,
European, international: ATR, C-Star, Verbmobil, Eutrans,
Nespole!, Fame, LC-Star, PF-Star, TC-STAR:
 restricted domains:
 appointment scheduling, conference registration, travelling, tourism
information, ...
 • vocabulary size: 3 000 – 10 000 words
 best performing systems and approaches: data-driven



example-based methods
finite-state transducers
statistical approaches
e.g.: Verbmobil evaluation [June 2000]: better by a factor of 2
 written language translation: US Tides project 2001-2004




unrestricted domain: press news, vocab.size »= 50 000 words
language pairs: Chinese!English, Arabic!English
performance [July 2003]:
best statistical systems are better than conventional/commercial
systems
Spoken Language Technologies: recent
advances and future challenges
18
VI FRAMEWORK PROGRAM
PRIORITY Multimodal Interfaces
IST-2002-2.3.1.6
TC-STAR
Technology and Corpora
for Speech to Speech
Translation
Contract Nr. FP6 506738
PARTNERS
Spoken Language Technologies: recent
advances and future challenges
20
TC-STAR
TC-STAR Project focuses on advanced research in
key technologies for speech to speech translation:
- speech recognition (ASR)
- spoken language translation (SLT)
- speech synthesis (TTS)
- Start: April 2004
- End: March 2007
- Grant: 11 M. Euro
- METHODOLOGY:
- COMPETITIVE EVALUATION
- COOPERATION
Spoken Language Technologies: recent
advances and future challenges
21
Vision
Transcription and Translation of broadcast news,
speeches, lectures and interviews
Hi, What do you think about
Simultaneous
Translation
Vocal access
Web access
Spoken Language Technologies: recent
advances and future challenges
22
Application Scenario

A selection of unconstrained conversational speech domains:
- Broadcast news
- European Parliament
Plenary Session

A few languages important for Europe society and economy:
 European
Accented English
 European Spanish
 Chinese
Spoken Language Technologies: recent
advances and future challenges
23
2005 FIRST EVALUATION RESULTS ON
THE EUROPEAN PARLIAMENT
PLENARY SESSION
TASK
The Evaluation Tasks and Databases translation tasks:
– English to Spanish:
EPPS: European Parliament Plenary Sessions
– Spanish to English:
EPPS: European Parliament Plenary Session
Three types of input to SLT:
– output of automatic speech recognition
– verbatim manual transcriptions
– final text editions (with punctuation marks)
Spoken Language Technologies: recent
advances and future challenges
24
2005 FIRST EVALUATION RESULTS ON
THE EUROPEAN PARLIAMENT
PLENARY SESSION
TASK
Training data
• Sentence-aligned speeches and their translations
• Final text editions: from April 1996 to Oct. 4th, 2004
• Verbatim transcriptions: from May 2004 to Oct. 4th, 2004
Development data Oct. 26, 2004
Evaluation data
Nov. 14, 2004
Spoken Language Technologies: recent
advances and future challenges
25
2005 FIRST EVALUATION RESULTS ON
THE EUROPEAN PARLIAMENT
PLENARY SESSION
TASK
Spoken Language Technologies: recent
advances and future challenges
26
2005 FIRST EVALUATION RESULTS ON
THE EUROPEAN PARLIAMENT
PLENARY SESSION
TASK
ASR EPPS DATA
- EUROEPAN ACCENTED ENGLISH:
- EUROPEAN SPANISH :
SLT EPPS DATA
- ENGLISH TO SPANISH
- SPANISH TO ENGLISH
word error rate - wer
9,5 % best TC-STAR
10,1 % best TC-STAR
position independent - wer
49% best PARTNER result
46% best PARTNER result
Spoken Language Technologies: recent
advances and future challenges
27
“ The spoken translation problem …….is still a
significant challenge:
Good text translation was hard enough to pull off.
Speech to speech MT was beyond going to the Moon
– it was Mars…”
[Steve Silbermann, Wired Magazine].
Spoken Language Technologies: recent
advances and future challenges
28