PowerPoint 簡報

Speech Interfaces to
Virtual Reality
Scott McGlashan
Swedish Institute of Computer Science
Reporter
try
Agenda






Purposes of this paper
Speech Interface 特色與整合時機
Problems of Integration
DIVERSE System
Enhancing DIVERSE
Conclusion
Purposes of this paper
 Analyze the technical and design issues
to combine a virtual world with a
speech interface.
 Describe architecture of the DIVERSE
system.
 Enhances DIVERSE to allow users to
talk directly to agent in virtual world.
• 透過Agent來操作虛擬世界的物件。
Integrating Speech Interface
 Use Speech + Direct manipulation to
form the multimodal Interface.
 Users are free to decide which modality
to use.
 In multimodal user interface,user can
issue more commands with less effort.
Benefits of Speech Interface
 Naturalness
 Hands / eyes freedom
 Beyond here and now
• Users can refer to objects which are not
present in theire current view .
Successful Speech Interface
System
 Speech interfaces can be beneficial
when they are more efficient than their
alternatives.(能用別的方法就不用語音輸
入)
 Has been successfully deployed as part
of interactive dialogue systems in
limited task domains.
 Ex: banking services , travel services.
Features of Successful
Speech Interface




Restricted Language
Incremental Information Transfer
Feedback(問卷)
Dialogue Management
Problems of Integration
 Speech Recognition : Limited
vocabulary to gain accuracy.
 Language Understanding : Limited
knowledge to maximize the
understanding.
 Interaction Metaphor : Who does the
user talk to?
Speech Recognition
 The most successful methods for
acoustic-phonetic modelling is HMM.
 N-grams language model is also
integrated into the recognition process.
• 有關HMM-based+N-grams語音辨識引擎之
結構可參考:
- http://java.cc.nccu.edu.tw/pr/report.htm
 The result is a speaker-independent ,
continuous speech system.
語音辨識引擎的種類
Language Understanding
 系統必須根據聽到的語音,解讀使用者
的意向,才能根據使用者的意向對虛擬
世界做改變。
Interaction Metaphor
 Direct manipulation -Personal Presence.
 Various metaphors for spoken
interaction have been proposed.
• Proxy
• Divinity
• Telekinesis(心電感應)
• Interface Agent
DIVE-Virtual Reality System
 DIVE(Distribute Interactive Virtual
Environment) is a multi-user virtual
environment.
 DIVE allow users and environment
interact in real-time.
 DIVE contains a database composed of
hierarchically organized objects .
DIVERSE
 Augment DIVE by adding multimodal
interface .
 Speech interface is part of DIVERSE
system.
 Its focus was “multimodal interface ”,
not speech interface.
The DIVERSE System
DIVERSE Features
 SR : Woodland , output is text string
 TTS : INFOVOX
 Does not manage the interaction for
users
• Does not confirm information.
• Does not correct errors when they arise.
• Always updated the world by recognition
results.
• No Dialogue management.
Reference Resolution of
DIVERSE
 Object Focus
 Property Perception
Object Focus
 A combination of parameters.
 These parameters have priorities and
may persist/decay over time.
• Ex : An object which is being point at is
more focus than one just in visual field.
“Bring me that box!”
物件的優先性及恆久性
Property Perception
 The property holds(成立) of an object if
the semantic value of this property is
“best-fit” for that object.
 Ex : move the “red” object.
Enhancing DIVERSE
 SR and Language Understanding
 Reference Resolution
• Discourse(談話) Modeling.
 Robustness(強固性)
• Confirm and Error Handling.
 Talking Agents
SR and Language
Understanding
 One of the main weakness of DIVERSE
is its SR accuracy.
 Change a better SR engine.
 Use pre-defined grammars to increase
accuracy.
Discourse Modeling
 The search will be inefficient if the search space
is always the whole virtual environment.
 With discourse modeling , we can constraint the
searching space.
 Ex : “Bring me the cube.” , the reference of “the
cube” should be resolved only in the eye-sight.
Talking Agents
 在DIVERSE中,透過同一個Agent操作
世界裏所有的物件。
• Interaction metaphor - Interface Agent.
 作者將所有世界中的物件分為talking
agent 及non-talking agent.
• Interaction metaphor – proxy.
Agent Modeling Framework
 The parent agent consists basic
functions.
 We can define more specific agent by
extend parent agent.
Example:Launcher Agent
 Launcher: launcher 476 here.
 User:target red tank.
 Launcher :please specify coordinates of red
tank.
 User: 437,342
 Launcher : targeted red tank at
437,342 .Launch missile ?
 User: confirm missile launch.
 Launcher: missile launched . over and out?
 User : over and out.
Conclusion
 Combining speech into virtual worlds
provides natural interaction.
 Speech interfaces work well when
cooperate with other user interface.
 Authors enhance the DIVERSE to gain
Further benefits.
Q&A
Backup
DIVERSE System Architecture