LT2206 Dialogue Systems Staffan Larsson Department of Philosophy, Linguistics and Theory of Science January 17, 2017 Outline Sylllabus Introduction to Dialogue Systems Course project Sylllabus Sylllabus Course homepage on GUL https://gul.gu.se/public/courseId/77645 4 / 34 Sylllabus Previous course evaluation and resulting improvements I I Lab supervision: new supervisors, improved support for lab supervisors Voxeo platform problems: alternatives have been investigated but no better alternative found I I I I ASR not top-notch May take time from changes in code until changes take effect Occasionally unstable Grading previously only based on exam. New grading model based on exam + labs + project (2.5 HEC each, 50% VG required for VG on course). 5 / 34 Introduction to Dialogue Systems Introduction to Dialogue Systems What are dialogue systems? I Dia logos = through language I Natural language interfaces enable the user to communicate with the computer in French, English, German, or another human language. Some applications of such interfaces are I I I I I I I I database queries information retrieval from texts so-called expert systems robot control computer games in-vehicle infotainment Spoken language may be combined with other modes of communication such as pointing with mouse or finger – multimodality 7 / 34 Introduction to Dialogue Systems Why build dialogue systems? I Theoretical purpose: test theories I I I I e.g. what kind of information does the system need to keep track of? However, complex system with many components – how to evaluate (Turing test not so useful) What are the theories about? Practical purpose: better human-computer interaction 8 / 34 Introduction to Dialogue Systems Why spoken interaction? I Spoken interaction is the natural way for humans to interact I I I computers should adapt to humans rather than the other way around important to enable systems to interact in a natural way Wireless devices have limited input capabilities (buttons, touch screen) I Devices with more functions than buttons / screen space I Telephone keypad can give users only a limited number of choices I There exist many more telephones than computers with the potential to access the Internet I Users want hands-free and/or eyes-free use I From a business viewpoint, voice applications open up a host of new revenue opportunities 9 / 34 Introduction to Dialogue Systems History of dialogue systems I ELIZA (Weizenbaum 1966) I I I I I I I I I I I I I I speech over phone first deployed system smartphone-based multimodal Houndify (2015), Amazon Alexa (2015) I I general platform form-filling dialogue Siri (Apple 2009) I I in-car spoken dialogue dialing etc VoiceXML (W3C 2000) I general platform finite state Philips train timetable system (Aust et al 1994) I I spoken dialogue joint planning CSLU Toolkit (McTear 1993) Linguatronics (1996) I text dialogue blocks world TRAINS (Allen et al 1991) I I text dialogue simulated psychoanalyst SHRDLU (Winograd 1972) I I proprietary platforms open for third party development ? Talkamatic (2016) I I flexible multimodal dialogue open platform 10 / 34 Introduction to Dialogue Systems Types of dialogue systems I by modality I I I I text-based spoken multi-modal by device I I I I I I I native in-browser in-virtual environment by style I I I command-based menu-driven flexible by initiative I I I I smartphone-based systems in-car systems “call-centre” systems robot systems desktop/laptop systems I I I system initiative user initiative mixed initiative by application I I I I I I I I I I I information service transaction-based command-and-control entertainment edicational/tutorial edutainment reminder systems companion systems healthcare eldercare assistive/access systems 11 / 34 Introduction to Dialogue Systems More on application types I information service I I I I I weather reports stock quotes time tables ... transaction-based I I I I shopping financial transactions travel reservations ... 12 / 34 Introduction to Dialogue Systems Spoken interaction vs. IVR 13 / 34 Introduction to Dialogue Systems Spoken interaction vs. IVR 14 / 34 Introduction to Dialogue Systems Dialogue systems architecture 15 / 34 Introduction to Dialogue Systems Language modules in a dialogue system Speech input Lexicon Speech output Speech recognizer/synthesizer Morphological analyzer/generator Grammar Syntactic parser/generator Knowledge base Semantic analyzer/reasoner Pragmatic analyzer/planner Dialogue state 16 / 34 Introduction to Dialogue Systems Finite state dialogue management 17 / 34 Introduction to Dialogue Systems Finite state dialogue management I Represents dialogue flow using Finite State Automaton I I I I (actually, FSA plus variable assignment) States of FSA: questions to the user Arcs: actions to take depending on user response System initiative (“single initiative”) dialogue I I System has all the initiative; ignores or misinterprets anything which is not a direct answer to a system question Human-human conversation is “mixed initiative”; initiative shifts back and forth between participants 18 / 34 Introduction to Dialogue Systems Finite state dialogue management I Advantages of single initiative I I I Many FSA dialogue systems also allow “universal” commands I I ASR needs only listen for answers to question just asked NLU becomes simpler can be said at any time, e.g. “help”, “start over”, “main menu” FSA systems may be sufficient for some very simple tasks, e.g. entering a credit card number 19 / 34 Introduction to Dialogue Systems Finite state dialogue management I I Insufficient for e.g. travel agent system Users often want to express their travel goals with complex sentences that may answer more than one question at a time I I I Hi I’d like to fly to Seattle Tuesday morning I want a flight from Milwaukee to Orlando one way leaving after five p.m. on Wednesday. FSA systems can’t handle these kinds of utterances 20 / 34 Introduction to Dialogue Systems Finite state dialogue management I Limitations of FSA dialogue models I I I I require that the user answer each question as it is asked theoretically possible to create an FSA which has a separate state for each possible subset of questions that the user’s statement could be answering would require a vast explosion in the number of states, making this a difficult architecture to conceptualize, modify, and debug Most systems avoid the pure system-initiative finite-state approach and use an architecture that allows mixed initiative I conversational initiative can shift between the system and user at various points in the dialogue. 21 / 34 Introduction to Dialogue Systems Form-based dialogue management I Common mixed initiative dialogue architecture I Relies on the structure of the frame itself to guide the dialogue. Asks the user questions to fill slots in the frame I I I Each slot may be associated with a question to ask the user, of the following type: I I I I I but allow the user to guide the dialogue by giving information that fills other slots in the frame ORIGIN CITY “From what city are you leaving?” DESTINATION CITY “Where are you going?” DEPARTURE TIME “When would you like to leave?” ARRIVAL TIME “When do you want to arrive?” A frame-based dialogue manager thus needs to ask questions of the user, filling any slot that the user specifies, until it has enough information to perform a data base query, and then return the result to the user 22 / 34 Introduction to Dialogue Systems Form-based dialogue management I If the user happens to answer two or three questions at a time, the system has to fill in these slots and then remember not to ask the user the associated questions for the slots. I Does away with the strict constraints that the finite-state manager imposes on the order that the user can specify information. 23 / 34 Introduction to Dialogue Systems Form-based dialogue management I Some domains seem to require the ability to deal with multiple frames, e.g. I I I I general route information (for questions like Which airlines fly from Boston to San Francisco?) information about airfare practices (for questions like Do I have to stay a specific number of days to get a decent airfare?) questions about car or hotel reservations Since users may switch from frame to frame, the system must be able to I I disambiguate which slot of which frame a given input is supposed to fill switch dialogue control to that frame. 24 / 34 Introduction to Dialogue Systems Form-based dialogue management I VoiceXML I I I Voice Extensible Markup Language an XML-based dialogue design language released by the W3C, the most commonly used of the various speech markup languages (such as SALT) I Goal: to create simple spoken dialogues I very simple mixed-initiative I form-based architecture 25 / 34 Introduction to Dialogue Systems Complexity levels in dialogue systems 26 / 34 Introduction to Dialogue Systems Methods in LT I Rule-based I Statistical I Hybrid 27 / 34 Introduction to Dialogue Systems Rule-based methods Example: suppose you want to translate from English to French I create a lexicon for English, and a corresponding one for French I write grammar rules for both languages I . . . perhaps relating English and French to a semantic representation I . . . or transfer rules which relate English and French constructions I Most dialogue systems are completely or mostly rule-based 28 / 34 Introduction to Dialogue Systems Statistical methods Example: suppose you want to translate from English to French I collect lots of examples of translations from English to French I . . . perhaps from a parallel corpus (with word alignment) I try to match new examples with old examples I use machine learning techniques based on statistical models of your data I Over the last 5-10 years there has been a focus on statistical methods for dialogue management, but the complexity of dialogue management have lead to doubts about the prospects of such methods I Hybrid systems 29 / 34 Introduction to Dialogue Systems Some challenges for dialogue systems I Increased interactivity (fast turntaking, parallell feedback etc.) I Learning and adapting to the user’s language, based on interaction Connecting language to perception and to the situation at hand I I I I I I Easy flexible dialogue scripting Automatically learning interaction patterns (instead of, or in addition to scripting) Improved speech recognition I I E.g. in in-car systems Faster and cheaper domain adaptation I I Cf. Google Glass Minimizing the cognitive load imposed by interaction Faster, more accurate, open domain, incremental, keep prosody Improved speech synthesis I Control prosody; mixing languages (codeswitching) 30 / 34 Introduction to Dialogue Systems Dialogue systems and human dialogue I Although existing dialogue systems are far from achieving human ability, they have numerous possible applications I Today’s computers do not understand our language but computer languages are difficult to learn and do not correspond to the structure of human thought I Even if the language the machine understands and its domain of discourse are very restricted, the ability to use human language can help make software and services easier to use I Communication with computers using spoken language will have a lasting impact upon the work environment I Completely new areas of application for information technology will open up. 31 / 34 Introduction to Dialogue Systems Multimodality I In our communication we mix language with other modes of communication and other information media. I We combine speech with gesture and facial expressions. I Digital texts are combined with pictures and sounds. I Thus speech and text technologies overlap and interact with many other technologies that facilitate processing of multimodal communication and multimedia documents. 32 / 34 Introduction to Dialogue Systems What are the big players up to? ⇒ GUL literature list 33 / 34 Introduction to Dialogue Systems Dialogue systems in Gothenburg I CLT Dialogue technology lab I I I I I I Dialogue systems Formal models of dialogue Applied speech technology Dialogue corpora and dialogue analysis The Spoken Web Mobile communication studies I CLASP: Centre for Linguistic Theory and Studies in Probability I Talkamatic AB: The Talkamatic dialogue system 34 / 34 Course project Course project Course project I Design your own voice-based game application I Implement it in Tropo or VoiceXML I Work in groups of two. I Your game does not have to be big – this is a small project – but hopefully you could come up with something original, something that works well using voice only. I Start by drawing a state machine or call graph that captures the dialog flow and then implement it using the techniques you have learned. I Next time: present your project ideas! 36 / 34
© Copyright 2026 Paperzz