Lecture notes

LT2206 Dialogue Systems
Staffan Larsson
Department of Philosophy, Linguistics and Theory of Science
January 17, 2017
Outline
Sylllabus
Introduction to Dialogue Systems
Course project
Sylllabus
Sylllabus
Course homepage on GUL
https://gul.gu.se/public/courseId/77645
4 / 34
Sylllabus
Previous course evaluation and resulting improvements
I
I
Lab supervision: new supervisors, improved support for lab supervisors
Voxeo platform problems: alternatives have been investigated but no
better alternative found
I
I
I
I
ASR not top-notch
May take time from changes in code until changes take effect
Occasionally unstable
Grading previously only based on exam. New grading model based on
exam + labs + project (2.5 HEC each, 50% VG required for VG on
course).
5 / 34
Introduction to Dialogue Systems
Introduction to Dialogue Systems
What are dialogue systems?
I
Dia logos = through language
I
Natural language interfaces enable the user to communicate with the
computer in French, English, German, or another human language.
Some applications of such interfaces are
I
I
I
I
I
I
I
I
database queries
information retrieval from texts
so-called expert systems
robot control
computer games
in-vehicle infotainment
Spoken language may be combined with other modes of
communication such as pointing with mouse or finger – multimodality
7 / 34
Introduction to Dialogue Systems
Why build dialogue systems?
I
Theoretical purpose: test theories
I
I
I
I
e.g. what kind of information does the system need to keep track of?
However, complex system with many components – how to evaluate
(Turing test not so useful)
What are the theories about?
Practical purpose: better human-computer interaction
8 / 34
Introduction to Dialogue Systems
Why spoken interaction?
I
Spoken interaction is the natural way for humans to interact
I
I
I
computers should adapt to humans rather than the other way around
important to enable systems to interact in a natural way
Wireless devices have limited input capabilities (buttons, touch
screen)
I
Devices with more functions than buttons / screen space
I
Telephone keypad can give users only a limited number of choices
I
There exist many more telephones than computers with the potential
to access the Internet
I
Users want hands-free and/or eyes-free use
I
From a business viewpoint, voice applications open up a host of new
revenue opportunities
9 / 34
Introduction to Dialogue Systems
History of dialogue systems
I
ELIZA (Weizenbaum 1966)
I
I
I
I
I
I
I
I
I
I
I
I
I
I
speech over phone
first deployed system
smartphone-based
multimodal
Houndify (2015), Amazon
Alexa (2015)
I
I
general platform
form-filling dialogue
Siri (Apple 2009)
I
I
in-car spoken dialogue
dialing etc
VoiceXML (W3C 2000)
I
general platform
finite state
Philips train timetable system
(Aust et al 1994)
I
I
spoken dialogue
joint planning
CSLU Toolkit (McTear 1993)
Linguatronics (1996)
I
text dialogue
blocks world
TRAINS (Allen et al 1991)
I
I
text dialogue
simulated psychoanalyst
SHRDLU (Winograd 1972)
I
I
proprietary platforms open
for third party development
? Talkamatic (2016)
I
I
flexible multimodal dialogue
open platform
10 / 34
Introduction to Dialogue Systems
Types of dialogue systems
I
by modality
I
I
I
I
text-based
spoken
multi-modal
by device
I
I
I
I
I
I
I
native
in-browser
in-virtual environment
by style
I
I
I
command-based
menu-driven
flexible
by initiative
I
I
I
I
smartphone-based systems
in-car systems
“call-centre” systems
robot systems
desktop/laptop systems
I
I
I
system initiative
user initiative
mixed initiative
by application
I
I
I
I
I
I
I
I
I
I
I
information service
transaction-based
command-and-control
entertainment
edicational/tutorial
edutainment
reminder systems
companion systems
healthcare
eldercare
assistive/access systems
11 / 34
Introduction to Dialogue Systems
More on application types
I
information service
I
I
I
I
I
weather reports
stock quotes
time tables
...
transaction-based
I
I
I
I
shopping
financial transactions
travel reservations
...
12 / 34
Introduction to Dialogue Systems
Spoken interaction vs. IVR
13 / 34
Introduction to Dialogue Systems
Spoken interaction vs. IVR
14 / 34
Introduction to Dialogue Systems
Dialogue systems architecture
15 / 34
Introduction to Dialogue Systems
Language modules in a dialogue system
Speech input Lexicon
Speech output Speech recognizer/synthesizer
Morphological analyzer/generator
Grammar
Syntactic parser/generator
Knowledge
base
Semantic analyzer/reasoner
Pragmatic analyzer/planner
Dialogue
state
16 / 34
Introduction to Dialogue Systems
Finite state dialogue management
17 / 34
Introduction to Dialogue Systems
Finite state dialogue management
I
Represents dialogue flow using Finite State Automaton
I
I
I
I
(actually, FSA plus variable assignment)
States of FSA: questions to the user
Arcs: actions to take depending on user response
System initiative (“single initiative”) dialogue
I
I
System has all the initiative; ignores or misinterprets anything which is
not a direct answer to a system question
Human-human conversation is “mixed initiative”; initiative shifts back
and forth between participants
18 / 34
Introduction to Dialogue Systems
Finite state dialogue management
I
Advantages of single initiative
I
I
I
Many FSA dialogue systems also allow “universal” commands
I
I
ASR needs only listen for answers to question just asked
NLU becomes simpler
can be said at any time, e.g. “help”, “start over”, “main menu”
FSA systems may be sufficient for some very simple tasks, e.g.
entering a credit card number
19 / 34
Introduction to Dialogue Systems
Finite state dialogue management
I
I
Insufficient for e.g. travel agent system
Users often want to express their travel goals with complex sentences
that may answer more than one question at a time
I
I
I
Hi I’d like to fly to Seattle Tuesday morning
I want a flight from Milwaukee to Orlando one way leaving after five
p.m. on Wednesday.
FSA systems can’t handle these kinds of utterances
20 / 34
Introduction to Dialogue Systems
Finite state dialogue management
I
Limitations of FSA dialogue models
I
I
I
I
require that the user answer each question as it is asked
theoretically possible to create an FSA which has a separate state for
each possible subset of questions that the user’s statement could be
answering
would require a vast explosion in the number of states, making this a
difficult architecture to conceptualize, modify, and debug
Most systems avoid the pure system-initiative finite-state approach
and use an architecture that allows mixed initiative
I
conversational initiative can shift between the system and user at
various points in the dialogue.
21 / 34
Introduction to Dialogue Systems
Form-based dialogue management
I
Common mixed initiative dialogue architecture
I
Relies on the structure of the frame itself to guide the dialogue.
Asks the user questions to fill slots in the frame
I
I
I
Each slot may be associated with a question to ask the user, of the
following type:
I
I
I
I
I
but allow the user to guide the dialogue by giving information that fills
other slots in the frame
ORIGIN CITY “From what city are you leaving?”
DESTINATION CITY “Where are you going?”
DEPARTURE TIME “When would you like to leave?”
ARRIVAL TIME “When do you want to arrive?”
A frame-based dialogue manager thus needs to ask questions of the
user, filling any slot that the user specifies, until it has enough
information to perform a data base query, and then return the result
to the user
22 / 34
Introduction to Dialogue Systems
Form-based dialogue management
I
If the user happens to answer two or three questions at a time, the
system has to fill in these slots and then remember not to ask the
user the associated questions for the slots.
I
Does away with the strict constraints that the finite-state manager
imposes on the order that the user can specify information.
23 / 34
Introduction to Dialogue Systems
Form-based dialogue management
I
Some domains seem to require the ability to deal with multiple
frames, e.g.
I
I
I
I
general route information (for questions like Which airlines fly from
Boston to San Francisco?)
information about airfare practices (for questions like Do I have to stay
a specific number of days to get a decent airfare?)
questions about car or hotel reservations
Since users may switch from frame to frame, the system must be able
to
I
I
disambiguate which slot of which frame a given input is supposed to fill
switch dialogue control to that frame.
24 / 34
Introduction to Dialogue Systems
Form-based dialogue management
I
VoiceXML
I
I
I
Voice Extensible Markup Language
an XML-based dialogue design language released by the W3C,
the most commonly used of the various speech markup languages (such
as SALT)
I
Goal: to create simple spoken dialogues
I
very simple mixed-initiative
I
form-based architecture
25 / 34
Introduction to Dialogue Systems
Complexity levels in dialogue systems
26 / 34
Introduction to Dialogue Systems
Methods in LT
I
Rule-based
I
Statistical
I
Hybrid
27 / 34
Introduction to Dialogue Systems
Rule-based methods
Example: suppose you want to translate from English to French
I
create a lexicon for English, and a corresponding one for French
I
write grammar rules for both languages
I
. . . perhaps relating English and French to a semantic representation
I
. . . or transfer rules which relate English and French constructions
I
Most dialogue systems are completely or mostly rule-based
28 / 34
Introduction to Dialogue Systems
Statistical methods
Example: suppose you want to translate from English to French
I
collect lots of examples of translations from English to French
I
. . . perhaps from a parallel corpus (with word alignment)
I
try to match new examples with old examples
I
use machine learning techniques based on statistical models of your
data
I
Over the last 5-10 years there has been a focus on statistical methods
for dialogue management, but the complexity of dialogue management
have lead to doubts about the prospects of such methods
I
Hybrid systems
29 / 34
Introduction to Dialogue Systems
Some challenges for dialogue systems
I
Increased interactivity (fast turntaking, parallell feedback etc.)
I
Learning and adapting to the user’s language, based on interaction
Connecting language to perception and to the situation at hand
I
I
I
I
I
I
Easy flexible dialogue scripting
Automatically learning interaction patterns (instead of, or in addition
to scripting)
Improved speech recognition
I
I
E.g. in in-car systems
Faster and cheaper domain adaptation
I
I
Cf. Google Glass
Minimizing the cognitive load imposed by interaction
Faster, more accurate, open domain, incremental, keep prosody
Improved speech synthesis
I
Control prosody; mixing languages (codeswitching)
30 / 34
Introduction to Dialogue Systems
Dialogue systems and human dialogue
I
Although existing dialogue systems are far from achieving human
ability, they have numerous possible applications
I
Today’s computers do not understand our language but computer
languages are difficult to learn and do not correspond to the structure
of human thought
I
Even if the language the machine understands and its domain of
discourse are very restricted, the ability to use human language can
help make software and services easier to use
I
Communication with computers using spoken language will have a
lasting impact upon the work environment
I
Completely new areas of application for information technology will
open up.
31 / 34
Introduction to Dialogue Systems
Multimodality
I
In our communication we mix language with other modes of
communication and other information media.
I
We combine speech with gesture and facial expressions.
I
Digital texts are combined with pictures and sounds.
I
Thus speech and text technologies overlap and interact with many
other technologies that facilitate processing of multimodal
communication and multimedia documents.
32 / 34
Introduction to Dialogue Systems
What are the big players up to?
⇒ GUL literature list
33 / 34
Introduction to Dialogue Systems
Dialogue systems in Gothenburg
I
CLT Dialogue technology lab
I
I
I
I
I
I
Dialogue systems
Formal models of dialogue
Applied speech technology
Dialogue corpora and dialogue analysis
The Spoken Web
Mobile communication studies
I
CLASP: Centre for Linguistic Theory and Studies in Probability
I
Talkamatic AB: The Talkamatic dialogue system
34 / 34
Course project
Course project
Course project
I
Design your own voice-based game application
I
Implement it in Tropo or VoiceXML
I
Work in groups of two.
I
Your game does not have to be big – this is a small project – but
hopefully you could come up with something original, something that
works well using voice only.
I
Start by drawing a state machine or call graph that captures the
dialog flow and then implement it using the techniques you have
learned.
I
Next time: present your project ideas!
36 / 34