© 2006 The University of Sheffield Spoken Language Processing for Artificial Cognitive Systems Prof. Roger K. Moore Chair of Spoken Language Processing Speech and Hearing Research Group (SPandH) Dept. Computer Science, Sheffield University, UK IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 1 © 2006 The University of Sheffield Spoken Language Technology X Automatic Speech Recognition X Text-to-Speech Synthesis X Spoken Language Dialogue Systems IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 2 1 © 2006 The University of Sheffield Recent Successes • The majority of mobile phones have voice dialling • Software for dictating documents on your PC is available in most computer stores • Interactive Voice Response (IVR) systems are becoming commonplace for handling telephone enquiries After all … “speech is the most natural mode of human-machine interaction” IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 3 © 2006 The University of Sheffield Scientific & Practical Progress SPEAKING STYLE Spontaneous speech Fluent speech word spotting digit strings Read speech Connected speech 1980 Isolated words voice commands 2 20 natural conversation 2-way dialogue network transcription agent & system driven intelligent dialogue messaging name 2000 dialing office form filling dictation by voice directory assistance 1990 200 2000 20000 Unrestricted VOCABULARY SIZE (#words) Figure by permission of Prof. Sadaoki Furui, TIT, Japan. Figure by permission of Prof. Sadaoki Furui, TIT, Japan. IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 4 2 © 2006 The University of Sheffield The Current Situation • Progress has not come about as a result of deep insights into human spoken language ☺ • Improvements have come from: – ‘data-driven’ approach – increase in available computer power – benchmark testing • However, current SLP technology is – fragile (in ‘real’ conditions) – expensive (to port to new applications and languages) – performance appears to be asymptoting well short of human abilities (~25% word error rate on conversational speech) • + it is not natural to talk to a machine (nor do machines talk naturally to us) IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 5 © 2006 The University of Sheffield Human-Human Communication? IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 6 3 © 2006 The University of Sheffield Capability Shortfalls “The industry has yet to bridge the gap between what people want and what it can deliver.” “Reducing the ASR error rate “After sixty years of remains the greatest concentrated research and challenge.” development in speech synthesis and text-to-speech (TTS), our gadgets, gizmos, executive toys and appliances still do not speak to us ‘Making Speech Mainstream’, X. D. Huang, intelligently.” ‘Making Speech Mainstream’, X. D. Huang, Microsoft Speech Technologies Group, 2002. Microsoft Speech Technologies Group, 2002. ‘Fiction ‘Fictionand andReality RealityofofTTS’, TTS’,Speech SpeechTechnology Technology Magazine, Magazine,vol.7, vol.7,no.1, no.1,Jan/Feb Jan/Feb2002. 2002. IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 7 © 2006 The University of Sheffield Current Trends Embedded technology for media management Embodied technology for conversational interaction IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 8 4 © 2006 The University of Sheffield Our Understanding of SLP is … Perfect Knowledge inreasing steadily? accelerating? saturating? falling! Perfect Ignorance Past Present Future IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 9 © 2006 The University of Sheffield ‘Real’ Spoken Language “The most sophisticated behaviour of the most complex organism in the known universe!” IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 10 5 © 2006 The University of Sheffield The Way Forward for SLP? • We need a radical re-think of our approach(es) • Spoken language needs to be treated as an intimate part of an artificial cognitive system, not simply as a fancy interface technology • Scientific-reductionism has lead us to treat recognition, synthesis and dialogue independently • The wider communicative function of speech tends to be ignored • Systematic behaviour resulting from speakerlistener interaction is thus observed (and modelled) as random variation IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 11 © 2006 The University of Sheffield The Way Forward for SLP? Time to take a whole-system view: – model speech as a behaviour that is conditioned on communicative context – where: • the talker has in mind the needs of the listener(s) listener(s) + • the listener has in mind the intentions of the talker(s) talker(s) IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 12 6 © 2006 The University of Sheffield X The Speech Chain Taken Takenfrom from‘The ‘TheSpeech SpeechChain: Chain:The ThePhysics Physicsand andBiology BiologyofofSpoken Spoken Language’, Language’,P.P.B. B.Denes Denes&&E.E.N. N.Pinson, Pinson,New NewYork: York:Anchor AnchorPress, Press,1973. 1973. IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 13 © 2006 The University of Sheffield The Communicative ‘Loop’ Feedback Feedback Control Control Process Process HUMAN MACHINE Moore MooreRRK. K.'Cognitive 'CognitiveInformatics: Informatics:The TheFuture FutureofofSpoken SpokenLanguage LanguageProcessing?', Processing?',Keynote Keynotetalk, talk, SPECOM SPECOM- -10th 10thInt. Int.Conf. Conf.on onSpeech Speechand andComputer, Computer,Patras, Patras,Greece, Greece,17-19 17-19October October(2005). (2005). IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 14 7 © 2006 The University of Sheffield What is Controlled in Speaking? • Listener behaviour • Listener’s perception of the linguistic message: – intelligibility – comprehensibility • Listener’s perception of the affective state: – – – – – emotion mood interpersonal stances attitudes personality traits • Listener’s perception of individuality IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 15 © 2006 The University of Sheffield What is Controlled in Speaking? • Listener behaviour Such factors can only be controlled Listener’s perception of the affective state: under arbitrary – emotion – mood conditions if there – interpersonal stances – attitudes – personality is a traits feedback loop • Listener’s perception of the linguistic message: – intelligibility – comprehensibility • • Listener’s perception of individuality IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 16 8 © 2006 The University of Sheffield What is Controlled in Listening? • Attention – allocation of computational resources (listening effort) – weighting of sensory data • Expectations – predictions of speaker behaviour – based on exemplar and/or abstracted data – can be viewed as a generative model (i.e. a model of the speaker) IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 17 © 2006 The University of Sheffield What is Controlled in Listening? • Attention Such factors have to be controlled Expectations dynamically within – predictions of speaker behaviour a communicative – based on exemplar and/or abstracted data – can be viewed as a generative model (i.e. feedback process a model of the speaker) – allocation of computational resources (listening effort) – weighting of sensory data • IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 18 9 © 2006 The University of Sheffield Evidence for Control in Speech • Being able to hear your own voice has a profound effect on speaking: – hearing-impaired individuals can have great difficulty maintaining clear pronunciations (or level control) – delayed feedback causes stuttering-like behaviour – people naturally tend to speak louder/differently in noise (Lombard, 1911) … • Speakers actively control their articulatory effort: – ‘H&H theory’ (Lindblom, 1990) • Carers talk differently to children – ‘parentese’ … IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 19 © 2006 The University of Sheffield Implications for Future SLP • A new approach to speech generation that: – selects its characteristics appropriate to the needs of the listener – monitors the effect of its own output – modifies its behaviour according to its internal model of the listener • A new approach to speech recognition that: – uses a generative model based on speech synthesis rather than HMMs – adapts its generative model to the voice of the talker based on knowledge of its own voice • No contemporary systems exploit such sensorimotor integration, overlap & control IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 20 10 © 2006 The University of Sheffield Cognition & Control in SLP COGNITION planning INTERPRETATION modelling EXPRESSION DIALOGUE word recognition emotion individuality PERCEPTION emotion unified models word generation individuality PRODUCTION multi-modal data fusion multi-modal synchronisation audio visual haptic audio visual haptic sensors effectors IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 21 © 2006 The University of Sheffield Key Research Issues • Communication: entropy management – particulate structure of language – coordination, cooperation, alignment, empathy • Behaviour: energy management – effort – attention • Planning: time management – – – – emulation, imitation memory prediction search IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 22 11 © 2006 The University of Sheffield Summary & Conclusion • Cognitive Informatics – potential of an exciting new era in SLP research • SLP at the heart of artificial cognitive systems – – – – – more than an interface technology evolution & acquisition of communicative skills grounded behaviour real speech, real environments naturalness → appropriateness • Multimodality – speech and gesture – audio-visual integration • Interdisciplinarity is key IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 23 © 2006 The University of Sheffield The Future of SLP-Enabled ACS? IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 24 12 © 2006 The University of Sheffield And Finally … a challenge! 2009 Loebner Competition 1st Turing test of speechbased human-machine interaction IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 25 © 2006 The University of Sheffield Thankyou IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 26 13
© Copyright 2026 Paperzz