Spoken Language Processing for Artificial Cognitive Systems

© 2006 The University of Sheffield
Spoken Language Processing
for
Artificial Cognitive Systems
Prof. Roger K. Moore
Chair of Spoken Language Processing
Speech and Hearing Research Group (SPandH)
Dept. Computer Science, Sheffield University, UK
IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 1
© 2006 The University of Sheffield
Spoken Language Technology
X
Automatic
Speech
Recognition
X
Text-to-Speech
Synthesis
X
Spoken Language
Dialogue Systems
IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 2
1
© 2006 The University of Sheffield
Recent Successes
• The majority of mobile phones have voice
dialling
• Software for dictating documents on your
PC is available in most computer stores
• Interactive Voice Response (IVR)
systems are becoming commonplace for
handling telephone enquiries
After all …
“speech is the most natural mode of
human-machine interaction”
IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 3
© 2006 The University of Sheffield
Scientific & Practical Progress
SPEAKING STYLE
Spontaneous
speech
Fluent
speech
word
spotting
digit
strings
Read
speech
Connected
speech
1980
Isolated
words
voice
commands
2
20
natural
conversation
2-way
dialogue
network transcription
agent &
system driven
intelligent
dialogue
messaging
name
2000
dialing
office
form filling
dictation
by voice
directory
assistance
1990
200
2000
20000 Unrestricted
VOCABULARY SIZE (#words)
Figure by permission of Prof. Sadaoki Furui, TIT, Japan.
Figure by permission of Prof. Sadaoki Furui, TIT, Japan.
IST Event 2006:
Cognitive Systems, Interaction & Robotics - slide 4
2
© 2006 The University of Sheffield
The Current Situation
• Progress has not come about as a result of
deep insights into human spoken language
☺
• Improvements have come from:
– ‘data-driven’ approach
– increase in available computer power
– benchmark testing
• However, current SLP technology is
– fragile (in ‘real’ conditions)
– expensive (to port to new applications and languages)
– performance appears to be asymptoting well short of
human abilities (~25% word error rate on
conversational speech)
• + it is not natural to talk to a machine
(nor do machines talk naturally to us)
IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 5
© 2006 The University of Sheffield
Human-Human Communication?
IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 6
3
© 2006 The University of Sheffield
Capability Shortfalls
“The industry has yet to bridge
the gap between what people
want and what it can deliver.”
“Reducing the ASR error rate
“After sixty years of
remains the greatest
concentrated research and
challenge.”
development in speech
synthesis and text-to-speech
(TTS), our gadgets, gizmos,
executive toys and appliances
still do not speak to us
‘Making Speech Mainstream’, X. D. Huang,
intelligently.”
‘Making Speech Mainstream’, X. D. Huang,
Microsoft Speech Technologies Group, 2002.
Microsoft Speech Technologies Group, 2002.
‘Fiction
‘Fictionand
andReality
RealityofofTTS’,
TTS’,Speech
SpeechTechnology
Technology
Magazine,
Magazine,vol.7,
vol.7,no.1,
no.1,Jan/Feb
Jan/Feb2002.
2002.
IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 7
© 2006 The University of Sheffield
Current Trends
Embedded technology
for media management
Embodied technology
for conversational interaction
IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 8
4
© 2006 The University of Sheffield
Our Understanding of SLP is …
Perfect
Knowledge
inreasing
steadily?
accelerating?
saturating?
falling!
Perfect
Ignorance
Past
Present
Future
IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 9
© 2006 The University of Sheffield
‘Real’ Spoken Language
“The most sophisticated behaviour of
the most complex organism in the
known universe!”
IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 10
5
© 2006 The University of Sheffield
The Way Forward for SLP?
• We need a radical re-think of our approach(es)
• Spoken language needs to be treated as an
intimate part of an artificial cognitive system, not
simply as a fancy interface technology
• Scientific-reductionism has lead us to treat
recognition, synthesis and dialogue
independently
• The wider communicative function of speech
tends to be ignored
• Systematic behaviour resulting from speakerlistener interaction is thus observed (and
modelled) as random variation
IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 11
© 2006 The University of Sheffield
The Way Forward for SLP?
Time to take a whole-system view:
– model speech as a behaviour that is
conditioned on communicative context
– where:
• the talker has in mind the needs of the
listener(s)
listener(s)
+
• the listener has in mind the intentions of
the talker(s)
talker(s)
IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 12
6
© 2006 The University of Sheffield
X
The Speech Chain
Taken
Takenfrom
from‘The
‘TheSpeech
SpeechChain:
Chain:The
ThePhysics
Physicsand
andBiology
BiologyofofSpoken
Spoken
Language’,
Language’,P.P.B.
B.Denes
Denes&&E.E.N.
N.Pinson,
Pinson,New
NewYork:
York:Anchor
AnchorPress,
Press,1973.
1973.
IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 13
© 2006 The University of Sheffield
The Communicative ‘Loop’
Feedback
Feedback
Control
Control
Process
Process
HUMAN
MACHINE
Moore
MooreRRK.
K.'Cognitive
'CognitiveInformatics:
Informatics:The
TheFuture
FutureofofSpoken
SpokenLanguage
LanguageProcessing?',
Processing?',Keynote
Keynotetalk,
talk,
SPECOM
SPECOM- -10th
10thInt.
Int.Conf.
Conf.on
onSpeech
Speechand
andComputer,
Computer,Patras,
Patras,Greece,
Greece,17-19
17-19October
October(2005).
(2005).
IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 14
7
© 2006 The University of Sheffield
What is Controlled in Speaking?
• Listener behaviour
• Listener’s perception of the linguistic message:
– intelligibility
– comprehensibility
• Listener’s perception of the affective state:
–
–
–
–
–
emotion
mood
interpersonal stances
attitudes
personality traits
• Listener’s perception of individuality
IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 15
© 2006 The University of Sheffield
What is Controlled in Speaking?
• Listener behaviour
Such factors can
only be controlled
Listener’s perception of the affective state:
under arbitrary
– emotion
– mood
conditions
if there
– interpersonal
stances
– attitudes
– personality
is a traits
feedback loop
• Listener’s perception of the linguistic message:
– intelligibility
– comprehensibility
•
• Listener’s perception of individuality
IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 16
8
© 2006 The University of Sheffield
What is Controlled in Listening?
• Attention
– allocation of computational resources
(listening effort)
– weighting of sensory data
• Expectations
– predictions of speaker behaviour
– based on exemplar and/or abstracted data
– can be viewed as a generative model (i.e.
a model of the speaker)
IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 17
© 2006 The University of Sheffield
What is Controlled in Listening?
• Attention
Such factors have
to be controlled
Expectations
dynamically within
– predictions of speaker behaviour
a communicative
– based
on exemplar and/or abstracted data
– can be viewed as a generative model (i.e.
feedback process
a model of the speaker)
– allocation of computational resources
(listening effort)
– weighting of sensory data
•
IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 18
9
© 2006 The University of Sheffield
Evidence for Control in Speech
• Being able to hear your own voice has a
profound effect on speaking:
– hearing-impaired individuals can have great difficulty
maintaining clear pronunciations (or level control)
– delayed feedback causes stuttering-like behaviour
– people naturally tend to speak louder/differently in
noise (Lombard, 1911) …
• Speakers actively control their articulatory
effort:
– ‘H&H theory’ (Lindblom, 1990)
• Carers talk differently to children
– ‘parentese’ …
IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 19
© 2006 The University of Sheffield
Implications for Future SLP
• A new approach to speech generation that:
– selects its characteristics appropriate to the needs of
the listener
– monitors the effect of its own output
– modifies its behaviour according to its internal model
of the listener
• A new approach to speech recognition that:
– uses a generative model based on speech synthesis
rather than HMMs
– adapts its generative model to the voice of the talker
based on knowledge of its own voice
• No contemporary systems exploit such
sensorimotor integration, overlap & control
IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 20
10
© 2006 The University of Sheffield
Cognition & Control in SLP
COGNITION
planning
INTERPRETATION
modelling
EXPRESSION
DIALOGUE
word
recognition
emotion
individuality
PERCEPTION
emotion
unified
models
word
generation
individuality
PRODUCTION
multi-modal
data fusion
multi-modal
synchronisation
audio visual haptic
audio visual haptic
sensors
effectors
IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 21
© 2006 The University of Sheffield
Key Research Issues
• Communication: entropy management
– particulate structure of language
– coordination, cooperation, alignment, empathy
• Behaviour: energy management
– effort
– attention
• Planning: time management
–
–
–
–
emulation, imitation
memory
prediction
search
IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 22
11
© 2006 The University of Sheffield
Summary & Conclusion
• Cognitive Informatics
– potential of an exciting new era in SLP research
• SLP at the heart of artificial cognitive systems
–
–
–
–
–
more than an interface technology
evolution & acquisition of communicative skills
grounded behaviour
real speech, real environments
naturalness → appropriateness
• Multimodality
– speech and gesture
– audio-visual integration
• Interdisciplinarity is key
IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 23
© 2006 The University of Sheffield
The Future of SLP-Enabled ACS?
IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 24
12
© 2006 The University of Sheffield
And Finally … a challenge!
2009 Loebner Competition
1st Turing test of speechbased human-machine
interaction
IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 25
© 2006 The University of Sheffield
Thankyou
IST Event 2006: Cognitive Systems, Interaction & Robotics - slide 26
13