Speech Interfaces to Virtual Reality Scott McGlashan Swedish Institute of Computer Science Reporter try Agenda Purposes of this paper Speech Interface 特色與整合時機 Problems of Integration DIVERSE System Enhancing DIVERSE Conclusion Purposes of this paper Analyze the technical and design issues to combine a virtual world with a speech interface. Describe architecture of the DIVERSE system. Enhances DIVERSE to allow users to talk directly to agent in virtual world. • 透過Agent來操作虛擬世界的物件。 Integrating Speech Interface Use Speech + Direct manipulation to form the multimodal Interface. Users are free to decide which modality to use. In multimodal user interface,user can issue more commands with less effort. Benefits of Speech Interface Naturalness Hands / eyes freedom Beyond here and now • Users can refer to objects which are not present in theire current view . Successful Speech Interface System Speech interfaces can be beneficial when they are more efficient than their alternatives.(能用別的方法就不用語音輸 入) Has been successfully deployed as part of interactive dialogue systems in limited task domains. Ex: banking services , travel services. Features of Successful Speech Interface Restricted Language Incremental Information Transfer Feedback(問卷) Dialogue Management Problems of Integration Speech Recognition : Limited vocabulary to gain accuracy. Language Understanding : Limited knowledge to maximize the understanding. Interaction Metaphor : Who does the user talk to? Speech Recognition The most successful methods for acoustic-phonetic modelling is HMM. N-grams language model is also integrated into the recognition process. • 有關HMM-based+N-grams語音辨識引擎之 結構可參考: - http://java.cc.nccu.edu.tw/pr/report.htm The result is a speaker-independent , continuous speech system. 語音辨識引擎的種類 Language Understanding 系統必須根據聽到的語音,解讀使用者 的意向,才能根據使用者的意向對虛擬 世界做改變。 Interaction Metaphor Direct manipulation -Personal Presence. Various metaphors for spoken interaction have been proposed. • Proxy • Divinity • Telekinesis(心電感應) • Interface Agent DIVE-Virtual Reality System DIVE(Distribute Interactive Virtual Environment) is a multi-user virtual environment. DIVE allow users and environment interact in real-time. DIVE contains a database composed of hierarchically organized objects . DIVERSE Augment DIVE by adding multimodal interface . Speech interface is part of DIVERSE system. Its focus was “multimodal interface ”, not speech interface. The DIVERSE System DIVERSE Features SR : Woodland , output is text string TTS : INFOVOX Does not manage the interaction for users • Does not confirm information. • Does not correct errors when they arise. • Always updated the world by recognition results. • No Dialogue management. Reference Resolution of DIVERSE Object Focus Property Perception Object Focus A combination of parameters. These parameters have priorities and may persist/decay over time. • Ex : An object which is being point at is more focus than one just in visual field. “Bring me that box!” 物件的優先性及恆久性 Property Perception The property holds(成立) of an object if the semantic value of this property is “best-fit” for that object. Ex : move the “red” object. Enhancing DIVERSE SR and Language Understanding Reference Resolution • Discourse(談話) Modeling. Robustness(強固性) • Confirm and Error Handling. Talking Agents SR and Language Understanding One of the main weakness of DIVERSE is its SR accuracy. Change a better SR engine. Use pre-defined grammars to increase accuracy. Discourse Modeling The search will be inefficient if the search space is always the whole virtual environment. With discourse modeling , we can constraint the searching space. Ex : “Bring me the cube.” , the reference of “the cube” should be resolved only in the eye-sight. Talking Agents 在DIVERSE中,透過同一個Agent操作 世界裏所有的物件。 • Interaction metaphor - Interface Agent. 作者將所有世界中的物件分為talking agent 及non-talking agent. • Interaction metaphor – proxy. Agent Modeling Framework The parent agent consists basic functions. We can define more specific agent by extend parent agent. Example:Launcher Agent Launcher: launcher 476 here. User:target red tank. Launcher :please specify coordinates of red tank. User: 437,342 Launcher : targeted red tank at 437,342 .Launch missile ? User: confirm missile launch. Launcher: missile launched . over and out? User : over and out. Conclusion Combining speech into virtual worlds provides natural interaction. Speech interfaces work well when cooperate with other user interface. Authors enhance the DIVERSE to gain Further benefits. Q&A Backup DIVERSE System Architecture
© Copyright 2026 Paperzz