An anatomy of graceful interaction in spoken and written man

Carnegie Mellon University
Research Showcase @ CMU
Computer Science Department
School of Computer Science
1979
An anatomy of graceful interaction in spoken and
written man-machine communication
Philip J. Hayes
Carnegie Mellon University
Dabbala Rajagopal Reddy
Follow this and additional works at: http://repository.cmu.edu/compsci
This Technical Report is brought to you for free and open access by the School of Computer Science at Research Showcase @ CMU. It has been
accepted for inclusion in Computer Science Department by an authorized administrator of Research Showcase @ CMU. For more information, please
contact [email protected].
NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS:
The copyright law of the United States (title 17, U.S. Code) governs the making
o f photocopies or other reproductions of copyrighted material. Any copying of this
document without permission of its author may be prohibited by law.
CMU-CS-79-144
An Anatomy of Graceful Interaction
in Spoken and Written
Man-Machine Communication
Phil Hayes
Raj Reddy
13 September 1979
Computer Science Department
Carnegie-Mellon University
Pittsburgh, P A 15213
This
ARPA
research
Order
w a s s p o n s o r e d by the Defense Advanced Research P r o j e c t s
No, 3 5 9 7 , monitored b y
the Air
Force
Avionics
Laboratory
Agency
Under
(DOD),
Contract
F33615-78-C-1551.
T h e v i e w s a n d conclusions contained in this document are those of the authors a n d s h o u l d
n o t b e i n t e r p r e t e d as r e p r e s e n t i n g the official policies, either e x p r e s s e d o r implied, of
D e f e n s e A d v a n c e d R e s e a r c h P r o j e c t s Agency or the US Government.
UNIVERSITY LIBRARIES
CARNEGIE-MELLON UNtVERSITT
PITTSBURGH. PENNSYLVANIA 15211
the
Abstract
T h e r e h a v e r e c e n t l y b e e n a number of attempts to provide natural and flexible
interfaces
t o c o m p u t e r s y s t e m s t h r o u g h the medium of natural language. While such i n t e r f a c e s t y p i c a l l y
p e r f o r m w e l l in r e s p o n s e to straightforward requests and questions w i t h i n their d o m a i n o f
discourse,
they
circumstances.
often
fail
to
interact
gracefully
with
Most c u r r e n t systems cannot, for instance:
their
users
in
less
predictable
r e s p o n d r e a s o n a b l y to input
not
c o n f o r m i n g to a r i g i d grammar; ask for and understand clarification if their u s e r ' s input
u n c l e a r ; o f f e r c l a r i f i c a t i o n of their o w n output if the user asks for it; or interact t o
is
resolve
a n y a m b i g u i t i e s that may arise w h e n the user attempts to describe things to the s y s t e m .
We
arise
b e l i e v e that g r a c e f u l interaction in these and the many other c o n t i n g e n c i e s that
can
i n human c o n v e r s a t i o n is essential if interfaces are e v e r to appear c o o p e r a t i v e
and
h e l p f u l , and h e n c e
e x p e r i e n c e d user.
be suitable for the casual or
naive user, and more
habitable
for
the
In this paper; w e attempt to circumscribe graceful interaction as a f i e l d f o r
s t u d y , a n d i d e n t i f y the problems involved in achieving it.
To
skills:
this e n d w e d e c o m p o s e graceful interaction into a number of r e l a t i v e l y
independent
skills i n v o l v e d in parsing elliptical, fragmented, and otherwise ungrammatical input; in
e n s u r i n g r o b u s t communication; in explaining abilities and limitations, actions and t h e m o t i v e s
b e h i n d them; in k e e p i n g track of the focus of attention of a dialogue; in i d e n t i f y i n g
things
from
terms
descriptions,
even
if
ambiguous
a p p r o p r i a t e f o r the context.
interaction
domains.
and w e
and
sufficient
for
or
unsatisfiable; and in d e s c r i b i n g
things
in
We claim these skills are necessary for any t y p e of
graceful
interaction
in a certain large
class
of
graceful
application
N o n e of these components is individually much b e y o n d the current state of t h e art,
outline
the architecture of a system that integrates them all.
Thus, w e
propose
g r a c e f u l i n t e r a c t i o n as an idea of great practical utility whose time has come and w h i c h
r i p e f o r implementation.
We are currently implementing a gracefully interacting s y s t e m a l o n g
t h e l i n e s p r e s e n t e d ; the s y s t e m will initially deal with t y p e d input, but is e v e n t u a l l y
t o accept natural speech.
is
intended
Table of Contents
Introduction
1. C o m p o n e n t s of Graceful Interaction
2. R o b u s t Communication
2.1. I n t r o d u c t i o n - The Principle of Implicit Confirmation
2.2. Implicit Acknowledgements
2.3. Explicit Indications of Incomprehension
2.4. E c h o i n g and the Use of Fragmentary Recognition
2.5. Summary of Robust Communication
3. F l e x i b l e P a r s i n g
3.1. Grammatical Deviations
3.2. Implications for Language Analysers
3.3. Summary of Flexible Parsing
4. Domain K n o w l e d g e
4.1. T h e Role of Domain Knowledge
4.2. Simple S e r v i c e s
4.3. R e p r e s e n t a t i o n s for Simple Service Domains
5. E x p l a n a t i o n Facility
5.1. Explanation T y p e s
5.2. Q u e s t i o n s A b o u t Ability - Indirect Speech Acts
5.3. Q u e s t i o n s About Ability in a-Restricted Domain
5.4.
5.5.
5.6.
5.7.
Event Questions
H y p o t h e t i c a l Questions
A b i l i t y M o d e l s as a Safety Net
Summary of Explanation Facilities
6. G o a l s and F o c u s
6.1. Goals
6.2. S u b g o a l s In Simple Service Domains
6.3. F o c u s
6.4. K e e p i n g track of focus
6.5. Summary of Goals and Focus
7. I d e n t i f i c a t i o n from Descriptions
7.1. T h e Identification Problem
7.2. A m b i g u o u s Descriptions
7.3. Unsatisfiable Descriptions
7.4. D e s c r i p t i o n s and Faulty Comprehension
7.5. Summary of Identification from Descriptions
8. Language Generation
9. R e a l i z i n g G r a c e f u l Interaction
9.1. A r c h i t e c t u r e of a Gracefully Interacting System
9.2. W o r k e d Example
Conclusion
1
Introduction
A
great
deal
of
interest
has
recently
been
shown
in providing
natural
and
flexible
c o m p u t e r i n t e r f a c e s t h r o u g h systems capable of engaging in a dialogue w i t h a human b e i n g in
m o r e o r l e s s natural language.
This interest is embodied in numerous systems i n c l u d i n g G U S
[ 2 ] , L I F E R [ 2 0 ] , P A L [37], P A R R Y [30], and PLANES [40], and in the w o r k of C o d d [6], G r o s z
[14],
and
others.
Such
systems
typically
respond
accurately
and
appropriately
to
s t r a i g h t f o r w a r d r e q u e s t s and questions or otherwise uneventful dialogue w i t h i n t h e i r d o m a i n
of
discourse.
Compared
language
interfaces
instance:
respond
to
appears
human beings, however, the performance
quite
reasonably
rigid
to input
and fragile.
Most
not conforming
current
of
current
systems
natural
cannot,
to a rigid grammar; ask
for
for
and
u n d e r s t a n d c l a r i f i c a t i o n if their user's input is unclear; o f f e r clarification of their o w n o u t p u t
if t h e u s e r asks f o r it; o r interact to resolve any ambiguities that may arise w h e n t h e u s e r
attempts
to
describe
things
naturally
and gracefully
to the system.
A dialogue
system cannot
hope
to
interact
with its users, unless it can cope w i t h these and the many
other
c o n t i n g e n c i e s , s o common in o r d i n a r y human conversation.
G r a c e f u l i n t e r a c t i o n , then, involves dealing appropriately w i t h anything a u s e r h a p p e n s t o
say, rather
t h a n just those inputs in the mainstream of a conversation.
Even though
little
a t t e n t i o n has b e e n paid to it in w o r k to date (PARRY [30] being the main e x c e p t i o n ) ,
b e l i e v e that g r a c e f u l interaction is essential in making natural language i n t e r f a c e s (for
typed
we
both
a n d s p o k e n input) appear cooperative and helpful to their users, and thus in making
c o m p u t e r s y s t e m s more accessible to casual or naive users, and more pleasant and g e n e r a l l y
h a b i t a b l e f o r t h e more e x p e r i e n c e d user.
In this p a p e r , w e claim that the ability to interact gracefully d e p e n d s o n a n u m b e r
relatively
i n d e p e n d e n t skills:
of
skills involved in parsing elliptical, fragmented, and o t h e r w i s e
u n g r a m m a t i c a l input; in e n s u r i n g robust communication; in explaining abilities and limitations,
a c t i o n s a n d the motives b e h i n d them; in keeping track of the focus of attention of a d i a l o g u e ;
in i d e n t i f y i n g things from descriptions, e v e n if ambiguous or unsatisfiable; and in d e s c r i b i n g
things
in t e r m s a p p r o p r i a t e for the context.
While none of these components of
graceful
i n t e r a c t i o n has b e e n e n t i r e l y neglected in the literature, no single current s y s t e m c o m e s c l o s e
t o h a v i n g most of the abilities and behaviours w e describe, and many are not p o s s e s s e d
any
current
systems.
This will become clear
in our discussion of
each component.
b e l i e v e , h o w e v e r , that none of these components is individually much b e y o n d the
by
We
current
s t a t e of t h e art, and later in the paper w e will present the architecture of a s y s t e m
that
c o m b i n e s them all.
T h u s , w e b e l i e v e that g r a c e f u l interaction is an idea of great practical utility w h o s e
time
h a s c o m e , and w h i c h is r i p e for implementation.
g r a c e f u l i n t e r a c t i o n as a field of study, to \dent\fy
This paper is an attempt to
circumscribe
the problems that n e e d to b e s o l v e d , a n d
t o p r e s e n t t h e d e s i g n of a s y s t e m that, for a suitably r e s t r i c t e d domain, is c a p a b l e of
g r a c e f u l interaction.
truly
W e are c u r r e n t l y implementing a gracefully interacting s y s t e m a l o n g t h e
l i n e s p r e s e n t e d ; t h e s y s t e m will initially deal w i t h t y p e d input, but is e v e n t u a l l y i n t e n d e d
to
accept natural speech.
1. Components of Graceful Interaction
G r a c e f u l i n t e r a c t i o n is not a single monolithic skill.
n u m b e r o f d i v e r s e abilities and b e h a v i o u r s .
Rather, it seems to b e c o m p o s e d of
a
In the succeeding sections, w e w i l l d e s c r i b e a s e t
o f a b i l i t i e s a n d b e h a v i o u r s that appear to b e essential for graceful i n t e r a c t i o n .
Although this
s e t c o n t a i n s n e c e s s a r y c o m p o n e n t s of graceful interaction, it is p r o b a b l y not s u f f i c i e n t
g r a c e f u l i n t e r a c t i o n i n g e n e r a l ; h o w e v e r , w e believe it p r o v i d e s a g o o d w o r k i n g b a s i s
for
from
w h i c h t o b u i l d g r a c e f u l l y i n t e r a c t i n g systems, and in particular, those w h i c h p r o v i d e a s i m p l e
service
i n a r e s t r i c t e d domain (e.g. telephone d i r e c t o r y assistance, r e s t a u r a n t
reservations,
c o m p u t e r mail s e r v i c e s ) ; w e will define this class a little more c a r e f u l l y in S e c t i o n 4.2.
T h e c o m p o n e n t s of g r a c e f u l interaction w e describe are b a s e d o n p h e n o m e n a o b s e r v a b l e i n
n a t u r a l l y ' occuring
human
dialogues.
We
believe
it
is
very
important
for
a
gracefully
i n t e r a c t i n g s y s t e m t o conduct a dialogue in as human-like a w a y as p o s s i b l e ; if t h e s t r a t e g i e s
the
system
employs
for
clarifying
its incomprehensions
d e s c r i p t i o n s s u p p l i e d b y the user, etc.
sam.? s i t u a t i o n , t h e n the user
graceful.
of
the user, r e s o l v i n g
ambiguous
are not the same as those a human w o u l d u s e i n t h e
will feel that the interaction is not
natural, a n d h e n c e
not
F u r t h e r m o r e , most of the components of graceful interaction w e h a v e g l e a n e d f r o m
h u m a n c o n v e r s a t i o n s i n v o l v e cooperation b e t w e e n speaker and listener; if the u s e r t r i e s
to
e m p l o y a n y of t h e s e t e c h n i q u e s in his interaction w i t h the system, an inability of t h e s y s t e m
t o c o o p e r a t e w i l l a p p e a r t o him most ungraceful. Thus the f r e e d o m w e w i s h t o g i v e t h e u s e r
t o e x p r e s s himself e x a c t l y as he w i s h e s requires that a gracefully i n t e r a c t i n g s y s t e m b e a b l e
t o d e a l w i t h d i a l o g u e p r o b l e m s in much the same w a y as a human d o e s .
j
U n f o r t u n a t e l y , t h e aim of b e i n g as human-like as possible must be t e m p e r e d b y t h e l i m i t e d
X^* p o t e n t i a l f o r c o m p r e h e n s i o n of any f o r s e e a b l e computer system.
Until a s o l u t i o n is f o u n d t o
t h e p r o b l e m s of o r g a n i z i n g and using the range of w o r l d k n o w l e d g e p o s s e s s e d b y a h u m a n ,
p r a c t i c a l s y s t e m s w i l l o n l y b e able to comprehend a small amount of input, t y p i c a l l y w i t h i n a
s p e c i f i c d o m a i n of e x p e r t i s e .
Graceful interaction must, t h e r e f o r e , supplement its s i m u l a t i o n
o f h u m a n c o n v e r s a t i o n a l ability w i t h strategies to deal naturally and g r a c e f u l l y w i t h i n p u t t h a t
is n o t f u l l y u n d e r s t o o d , and, if possible, to steer a conversation back t o the s y s t e m ' s
ground.
home
3
W e p r o p o s e s e v e n components of graceful interaction.
nor
are
the boundaries
b e t w e e n the several components completely w a t e r t i g h t ; a r g u m e n t s
c o u l d b e made f o r d r a w i n g them in different places.
any
of
This set is p r o b a b l y not s u f f i c i e n t ,
However, it w o u l d be h a r d to a r g u e that
t h e s e v e n w e r e unimportant for graceful interaction.
T h e components w e
propose
are:
-
Robust communication: The set of strategies needed to e n s u r e that
r e c e i v e s a s p e a k e r s utterance, and interprets it correctly.
a
listener
-
Flexible parsing: The ability to deal with naturally used natural language, w i t h all
t h e e l l i p s e s , idioms, grammatical e r r o r s , and fragmentary u t t e r a n c e s it c a n
contain.
-
Domain knowledge: Not strictly-speaking a component of graceful i n t e r a c t i o n , but
a p r e r e q u i s i t e f o r all the other components.
-
Explanation facility: The ability of the system to explain what it can and c a n n o t
d o , w h a t it has d o n e , what it is trying to do, and why, b o t h for r e s p o n s e to
d i r e c t q u e s t i o n s , and as a fall-back w h e n communication breaks d o w n .
-
Focus mechanisms: T h e ability to keep track of what the c o n v e r s a t i o n is about, as
t h e items u n d e r discussion shift; this is important for the r e s o l u t i o n of e l l i p s i s
a n d a n a p h o r a , as w e l l as for continuity in the conversation.
-
Identification from descriptions: The ability to recognize an o b j e c t f r o m a
d e s c r i p t i o n ; this involves the ability to pursue a clarifying dialogue if the o r i g i n a l
d e s c r i p t i o n is not clear.
-
Generation of descriptions: The ability to generate d e s c r i p t i o n s that
are
a p p r o p r i a t e f o r the context, and that satisfy requirements imposed b y the o t h e r
c o m p o n e n t s , e s p e c i a l l y robust communication.
W h i l e n o n e of t h e s e components of graceful interaction have b e e n e n t i r e l y n e g l e c t e d in t h e
l i t e r a t u r e , n o s i n g l e c u r r e n t system comes close to having most of the abilities a n d b e h a v i o u r s
w e d e s c r i b e , and many are not possessed by any current systems.
In w h a t f o l l o w s , w e will consider each component in detail; in particular, in e a c h c a s e we
will d i s c u s s :
-
t h e n a t u r a l language and dialogue phenomena on which the component is b a s e d ;
-
t h e abilities and b e h a v i o u r s needed by a computer dialogue system to p a r t i c i p a t e
g r a c e f u l l y in a c o n v e r s a t i o n containing these phenomena (including t h o s e
d i f f e r i n g f r o m humans, but necessitated by those limitations of c u r r e n t d i a l o g u e
s y s t e m s that are unlikely to be removed in the near future);
-
t h e e x t e n t to w h i c h any current systems contain the component (including
any
r e a s o n s w h y their basic structure or methods might preclude their i n c o r p o r a t i o n
4
of t h e component, and the implications of this for the design of systems
which
t r y t o implement the component).
In t h e s e d i s c u s s i o n s , w e will in general make no distinction b e t w e e n s p o k e n and
typed
l a n g u a g e ; this r e f l e c t s our v i e w that most of the principles of graceful i n t e r a c t i o n a r e
s a m e w h e t h e r the medium of communication is speech or text.
the
In cases w h e r e it d o e s make a
d i f f e r e n c e , mainly those involving problems with speech that do not arise for t y p e d i n p u t , w e
w i l l n o t e t h e d i f f e r e n c e explicitly.
The discussions will be illustrated b y example c o n v e r s a t i o n
f r a g m e n t s ; t h e s e c o n v e r s a t i o n s are intended to be typical of those that might o c c u r
between
a g r a c e f u l l y i n t e r a c t i n g system and its user; accordingly, w e have set them in simple s e r v i c e
d o m a i n s of the t y p e mentioned above (see Section 4.2 for a more precise d e f i n i t i o n of s i m p l e
service
domains).
internal
directory
In fact, w e
have used two domains of discourse in our e x a m p l e s :
assistance system, and a restaurant reservation system.
The
an
examples
c h o s e n a r e b a s e d in part o n protocols of actual conversations (with t w o human p a r t i c i p a n t s ) ,
a n d i n p a r t o n c o n v e r s a t i o n s invented to illustrate particiJIar points.
2. Robust Communication
D u r i n g t h e c o u r s e of a conversation, it is not uncommon for people to m i s u n d e r s t a n d o r fail
to
understand
each
other.
Such
failures
in
communication
do
not
usually
cause
the
c o n v e r s a t i o n t o b r e a k d o w n ; rather, the participants are able to resolve the d i f f i c u l t y , u s u a l l y
b y a s h o r t c l a r i f y i n g sub-dialogue, and continue with the conversation from w h e r e t h e y
off.
Current
computer
systems
are unable to take part
r e s o l v e communication difficulties in any other way.
left
in such c l a r i f y i n g d i a l o g u e s ,
or
As a result, w h e n such d i f f i c u l t i e s o c c u r ,
a c o m p u t e r d i a l o g u e system is unable to keep up its end of the c o n v e r s a t i o n , and a c o m p l e t e
breakdown
is likely to result; this fragility lies in stark and unfavourable c o n t r a s t
to
the
r o b u s t n e s s of human dialogue.
In
this
section, we
discuss
the cooperative
techniques
that
humans
use
to
overcome
d i f f i c u l t i e s in communication, and the w a y s in which these techniques can b e made a v a i l a b l e t o
c o m p u t e r d i a l o g u e systems.
2.1. Introduction - The Principle of Implicit Confirmation
E v e r y time a human b e i n g speaks to another in an attempt to communicate s o m e t h i n g ,
f a c e s t h r e e o b s t a c l e s to getting his message across:
-
t h e l i s t e n e r may not r e c e i v e the message;
-
t h e l i s t e n e r may r e c e i v e the message but be unable to interpret all or p-art of it;
5
-
t h e l i s t e n e r may i n t e r p r e t the message incorrectly.
A l t h o u g h t h e s e difficulties do not occur in the majority of attempts at communication, t h e y a r e
n o t u n c o m m o n in o r d i n a r y human conversation and arise completely u n p r e d i c t a b l y .
them
does
convey
o c c u r , the listener
with
potentially
unpredictable
always
errors
manage
elimination or
to
in communication, human dialogue is extremely robust; p e o p l e
almost
message
across.
for
This
the
conversation.
to
these
their
consequences
intended
Despite
get
serious
robustness
does
not
stem
from
the
minimization of such e r r o r s , but rather from a set of t e c h n i q u e s , b a s e d
cooperation
b e t w e e n speaker
correct
errors
the
will not receive the message that the s p e a k e r
If o n e o f
that
and listener, that humans use to detect, r e c o v e r
occur.
It is these
techniques
and their
application
i n t e r a c t i o n that w e will be considering in this and the following subsections.
on
from,
and
to
graceful
The
present
s u b s e c t i o n , in particular, discusses a convention, tacitly agreed by participants in a h u m a n
d i a l o g u e , o n w h i c h all the techniques are based.
S i n c e a s p e a k e r t y p i c a l l y has no direct way of telling whether his listener has r e c e i v e d his
m e s s a g e c o r r e c t l y , d e t e c t i o n of communication difficulties must rest o n some c o n v e n t i o n
of
a c k n o w l e d g e m e n t , commonly agreed upon by speaker and listener.
One p o s s i b l e c o n v e n t i o n
is
speaker
for
the
listener
essentially
the
to explicitly
technique
acknowledge
everything
that
the
says.
e m p l o y e d for communication b e t w e e n n e t w o r k s of
w h i c h a n a l o g o u s communication problems can arise.
This
is
computers
in
While this convention e n s u r e s
extremely
a c c u r a t e communication, constant explicit acknowledgement would be too tedious f o r h u m a n s .
Instead,
humans
occaisonally
use
allowing
a
convention
inaccurate
which
requires
communication.
much
We
less
overhead
call this c o n v e n t i o n
at
the
the
cost
of
principle
of
implicit c o n f i r m a t i o n , and can state it as follows:
T h e s p e a k e r assumes his message has been received by the listener,
a n d r e c e i v e d c o r r e c t l y , unless the listener indicates o t h e r w i s e .
T h i s a s s u m p t i o n of communication on the speaker's part is analogous to the a s s u m p t i o n o n
t h e s p e a k e r ' s part that the listener will follow any shift in focus he may make, as n o t e d
by
G r o s z [15].
While
this
approach
is
obviously
very
efficient
when
c o m m u n i c a t i o n , it places a large burden on the listener.
that
he
has
conversation
comedies
techniques
not
will
exploit
received
continue
the
message
correctly,
the
there
is
no
to good effect.
This
f o r r o b u s t communication can sometimes fail.
with
Unless the listener c a n d e t e r m i n e
error
will
go
u n c o r r e c t e d and may become quite c o n f u s e d .
s u c h confusions
difficulty
is the
basic
undetected;
Marx
reason
brothers*
that
Fortunately, the l i s t e n e r
the
human
normally
h a s e n o u g h e x p e c t a t i o n s about the sorts of things the speaker might say to make this a r a r e
6
occurrence.
In o r d e r f o r its interaction to appear natural, a gracefully interacting s y s t e m must u s e t h i s
s a m e p r i n c i p l e of implicit confirmation in its conversation with its user.
Given the n e c e s s a r i l y
l i m i t e d linguistic abilities of current systems, a gracefully interacting system is, in fact, l i a b l e
t o r u n i n t o more communication e r r o r s in the form of miscomprehension and i n c o m p r e h e n s i o n
than
is
a
human,
acknowledgement)
Furthermore,
the
and
if
the
to deal
user
will
system
with
these
naturally
tries
to use
e r r o r s , it will
employ
implicit
another
seem
convention
most
(such
ungraceful
as
to
confirmation, and unless
explicit
the
the
user.
system
u n d e r s t a n d s this c o n v e n t i o n and cooperates in it, inaccurate communication, and c o n s i d e r a b l e
f r u s t r a t i o n o n the part of the user is liable to result. As far as w e know, no c u r r e n t d i a l o g u e
s y s t e m c o n d u c t s its dialogues in accordance with this principle of implicit c o n f i r m a t i o n .
A l t h o u g h w e h a v e b e e n using the terms, "speaker" and "listener", e v e r y t h i n g w e h a v e s a i d
a p p l i e s e q u a l l y to dialogue t y p e d at a terminal.
T y p e d dialogues are just as s u s c e p t i b l e
incomprehension
and miscomprehension as spoken dialogues; the only d i f f e r e n c e
sources
problems
of
such
misrecognized
in communication.
In t y p e d dialogues, w o r d s
are
as t h e y can be with speech; on the other hand, one cannot
e r r o r s in s p o k e n dialogue.
is in
to
the
not
usually
make
spelling
In what follows, we will continue to blur the distinction
between
t h e t w o f o r m s of dialogue; our discussion will be ostensibly of s p o k e n dialogue, but all t h e
p o i n t s w e make will a p p l y equally to t y p e d dialogue unless o t h e r w i s e noted.
In t h e next s u b s e c t i o n , w e will discuss details of how the speaker can tell if his m e s s a g e
f a i l s t o b e r e c e i v e d at all; the t w o subsequent subsections discuss t w o simple m e t h o d s
by
which
his
the
listener
comprehension.
can
indicate
his
lack of
comprehension
or
lack of
confidence
In all t h r e e cases w e will comment on the extent to w h i c h the
in
techniques
d i s c u s s e d a r e u s e d in current dialogue systems, and any modifications that are likely t o
required
in
a practical
gracefully
interacting system, particularly
those
required
by
be
the
l i m i t a t i o n s of the c u r r e n t state of the art in language understanding. The final s u b s e c t i o n is a
summary.
2.2. Implicit Acknowledgements
In t h i s s e c t i o n w e consider how the speaker can tell if his message w a s r e c e i v e d at all,
w h e t h e r c o r r e c t l y or incorrectly, and what he can do if it w a s not r e c e i v e d .
As
the
principle
of
implicit
confirmation states, the speaker
r e c e i v e d u n l e s s the listener indicates otherwise.
assumes
his m e s s a g e
was
How can the listener indicate that he h a s
n o t r e c e i v e d a message w h i c h w e are assuming he does not know has b e e n sent?
Obviously,
h e c a n n o t d o this actively or explicitly; instead, he does it tacitly b y not d o i n g a n y t h i n g i.e. b y
7
not r e p l y i n g .
T o put it another way, the speaker expects a reply, and if he d o e s not r e c e i v e
o n e w i l l assume that his message has not been received at all.
This e x p e c t a t i o n p r o v i d e s a
w a y c o n s i s t e n t w i t h the principle of implicit confirmation of detecting communication e r r o r s in
w h i c h t h e l i s t e n e r fails to r e c e i v e any of the speaker's message.
In g e n e r a l human dialogue, replies need not be linguistic; for instance, the l i s t e n e r
s i m p l y p e r f o r m an action that the speaker requested.
interaction,
in
which
conversations
are
either
However, for the p u r p o s e s of g r a c e f u l
typed
or
spoken
without
any
visual
c o m m u n i c a t i o n (as o v e r a telephone), we may confine ourselves to p u r e l y linguistic
The
might
replies.
r e p l y c a n come either w h e n the speaker has finished delivering his message a n d
paused
to
interruption
await
a
before
reply
(typed
input
systems operate
the e n d of the message.
exclusively
this
way)
or
by
If no reply is forthcoming, the s p e a k e r
has
an
will
a s s u m e that his message has not b e e n received.
O n e o p t i o n w i t h a message that has not b e e n received is to repeat it; the more usual a n d ,
if r e p e t i t i o n fails, e v e n t u a l l y n e c e s s a r y strategy is to assume that t h e r e is some fault in t h e
c h a n n e l of communication, and to try either to re-establish communication or to c o n f i r m that
t h e c h a n n e l is d e f u n c t .
might
be
a shouted
W h e n the speaker is in the presence of the listener, this
"can y o u hear
me"; the listener
might be asleep or deaf.
attempt
Over
the
t e l e p h o n e , a much more common situation, the attempt uses phrases such as "Hello!", " A r e y o u
t h e r e ? " , "I c a n hear y o u , can y o u hear me?".
If the original speaker r e c e i v e s s u i t a b l e
replies
t o t h e c h a n n e l c h e c k i n g utterances, the substantive part of the dialogue can p r o c e e d .
If t h e
c h a n n e l is i n d e e d d e f u n c t , the dialogue is at an end. Similar but more conventional f o r m s a r e
u s e d at the b e g i n n i n g of a dialogue to ensure that the channel is indeed o p e n and a v a i l a b l e
for communication.
W e h a v e s a i d that a r e p l y to a message is enough to convince the s p e a k e r of the m e s s a g e
t h a t t h e m e s s a g e w a s indeed received.
or
will
any
reply
do?
Clearly
Are there any constraints o n the f o r m of t h e r e p l y ,
any reply
of
the channel-checking
f o r m will
not
confirm
c o m m u n i c a t i o n , but will actively indicate failure of the message to get across:
S: T h e n u m b e r for Joe Smith is 5 2 6 7
U: H e l l o ! A r e y o u there?
S: Y e s ! C a n y o u hear me?
T h e a p p r o p r i a t e r e s p o n s e is to participate in a channel checking dialogue w i t h the user.
Apart
imply
f r o m this case, t h e r e are v e r y f e w other responses w h i c h w o u l d not b e t a k e n
reception
(correct
or
incorrect) of the message.
It is not n e c e s s a r y
to o b t a i n
to
an
a n s w e r t o a q u e s t i o n ; counter questions and e v e n quite distantly related statements s e r v e as
replies:
8
U: W h a t is the number for Joe Smith?
S:
Do y o u k n o w the address?
U:
Sorry!
I mean Bill Smith.
If t h e l i s t e n e r g i v e s a r e p l y that appears completely inappropriate to the original s p e a k e r , h e
is m o r e l i k e l y to assume that the listener received the original message i n c o r r e c t l y ,
rather
t h a n n o t at all.
A g r a c e f u l l y interacting system should, of course, conform to the human c o n v e n t i o n s
about
implicit
a c k n o w l e d g e m e n t s , i.e. it should reply to all of its user's utterances w i t h i n a s h o r t
period
(possibly
giving
a time
filling reply
or
request
to "hold o n " if
it cannot
give
a
s u b s t a n t i v e r e p l y s o o n enough), and it should expect its user to r e p l y to it in a similarly s h o r t
time.
If
this
expectation
is
not
met,
it
should
be
able
to
initiate
c h a n n e l - c h e c k i n g dialogue, and if possible, return to its original utterance.
and
complete
a
Of c o u r s e , it must
a l s o b e a b l e to s u s p e n d this kind of "time-out" expectation if the user asks it t o w a i t , a n d
n a t u r a l l y t h e r e is no n e e d to limit the "patience" of such a system v e r y tightly.
T h e i m p o r t a n c e of r e p l y i n g to the user within a short time of his input has b e e n s t r e s s e d
i n t h e L I F E R s y s t e m [20]. T h e immediate replies, however, took the form of p r o g r e s s r e p o r t s
o n t h e p r o c e s s i n g of the input, and essentially s e r v e d as time-fillers.
D e s c r i p t i o n s of
other
s y s t e m s s u c h as P A R R Y [30] and PLANES [40] have stressed the need for r a p i d r e p l i e s , b u t
these
systems
time-filling
do
reply
not attempt to monitor the s p e e d of their o w n r e p l i e s
if
it
is
too
slow.
One
version
h o w e v e r , a p o l o g i z e if it took too long to respond.
of
the
SCHOLAR
and p r o d u c e
system
[4]
a
did,
No systems to our k n o w l e d g e a r e a b l e t o
c o n d u c t a c h a n n e l - c h e c k i n g dialogue.
2.3. Explicit Indications of Incomprehension
The
principle
whenever
he
of
fails
implicit
confirmation
to comprehend
message incorrectly.
requires
a message or
a listener
suspects
to give
that
he
an explicit
may
have
indication
received
a
In this section we deal with the most s t r a i g h t f o r w a r d w a y a l i s t e n e r c a n
d o t h i s : b y asking the speaker what he said or meant.
Such questions v a r y a c c o r d i n g t o h o w
m u c h of t h e o r i g i n a l utterance the listener thinks he understood, h o w informative h e t r i e s t o
be
about
the
nature of the problem, and how much he wants to influence the
speaker's
subsequent reply.
The
speaker
least
informative
way
of
indicating incomprehension, and indirectly
of
asking
t o r e p e a t his message is through a phrase such as "I b e g y o u r p a r d o n ? " o r
the
"What
w a s t h a t ? " If the lack of understanding was caused b y a transient p r o b l e m s u c h as n o i s e
on
t h e c h a n n e l o r inattention o n the part of the speaker, this strategy may result in a s e c o n d
a n d s u c c e s s f u l attempt at communication.
If, on the other hand, the failure w a s c a u s e d b y a
9
n o n - t r a n s i e n t p r o b l e m such as the listener's unfamiliarity with the w o r d s or c o n s t r u c t i o n u s e d
(uncommon
with
human beings, but
likely to occur
frequently
w i t h computer
systems),
s e c o n d attempt at communication is no more likely to be successful than the first.
original
speaker
is g i v e n no clue to the nature of
the difficulty
w h a t h e is s a y i n g in a hit and miss fashion.
Since the
the listener is h a v i n g
u n d e r s t a n d i n g w h a t he is saying, he is unable to correct that difficulty e x c e p t b y
a
in
changing
Successful communication is more likely t o r e s u l t
if t h e l i s t e n e r c a n p r o v i d e some clues to the nature of the problem, and the c o n v e n t i o n s
of
h u m a n d i a l o g u e make it e a s y to do so. Indeed, we believe that the p r o v i s i o n of i n f o r m a t i o n t o
t h e o r i g i n a l s p e a k e r about w h e r e and what he is failing to communicate is fundamental t o t h e
e x t r a o r d i n a r y r o b u s t n e s s of human conversation.
Without the ability to z e r o in o n p r o b l e m s ,
h u m a n c o n v e r s a t i o n c o u l d not achieve robustness in the apparently e f f o r t l e s s w a y
that
it
does.
T h e s p e c i f i c i t y of the clues the listener can provide to the speaker about the n a t u r e
and
l o c a t i o n of a communication difficulty depends on the d e g r e e to w h i c h the listener f a i l e d t o
u n d e r s t a n d the u t t e r a n c e .
If he understood nothing at all, the only clue he c a n p r o v i d e
is
w h a t h e e x p e c t e d the s p e a k e r to say:
S: T h i s is d i r e c t o r y assistance.
U: <GARBLE>
S:
I didn't q u i t e c a t c h that; can I get a number for you?
If h e h a s u n d e r s t o o d something, he can show what he has understood while indicating that h e
h a s n o t u n d e r s t o o d fully:
U: W h a t is the number for <GARBLE>?
S: W h o s e number did y o u want?
U: W h a t is *he number for Jim <GARBLE>?
S: Jim w h o ?
T h i s a l l o w s t h e original speaker to concentrate on transmitting only the part of the m e s s a g e
that w a s not u n d e r s t o o d , making "Smith", for instance, a suitable r e p l y in the s e c o n d e x a m p l e .
If t h e l i s t e n e r has u n d e r s t o o d enough to narrow the possibilities d o w n to a small n u m b e r
he
c a n list them.
U: W h a t is the number for <?>ill Smith?
S: Did y o u s a y Bill Smith or Phil Smith?
T h i s s t r a t e g y allows the original speaker to solve the communication p r o b l e m w i t h a p h r a s e
l i k e " t h e s e c o n d o n e " , w h i c h is presumably much less susceptible to e r r o r than r e p e t i t i o n of
t h e p a r t of the communication that originally caused the e r r o r ("Bill" or "Phil").
the
information
given
by
these
various
strategies
to the original s p e a k e r
In g e n e r a l ,
allows
him
c o n c e n t r a t e o n imparting the information that did not get across, and thus makes f o r
to
much
10
g r e a t e r d i r e c t e d n e s s and robustness in the resolution of communication difficulties.
A s a final
s t r a t e g y , if the listener has p r o d u c e d an interpretation for the utterance, but is u n c e r t a i n of
t h a t i n t e r p r e t a t i o n b e c a u s e of noise or because the interpretation is incongruous, he c a n ask
e x p l i c i t l y if his complete interpretation is correct.
T h e w a y in w h i c h incomprehension is indicated can influence the form of the
clarification.
subsequent
T h i s fact is of particular importance for a gracefully interacting s y s t e m
limited p o w e r
of comprehension.
It is not sufficient for such a s y s t e m m e r e l y to
with
indicate
i n c o m p r e h e n s i o n , or e v e n to indicate it informatively, so that its user is made c l e a r l y a w a r e of
t h e n a t u r e of its difficulty in comprehension; it should also indicate its i n c o m p r e h e n s i o n in a
f o r m most l i k e l y to elicit a clarification from the user that the system c a n c o m p r e h e n d .
For
i n s t a n c e , as C o d d [6] points out, questions of the form "What do y o u mean b y ..." a r e liable t o
e l i c i t a d e s c r i p t i o n o n a level much too sophisticated for such a system.
T h u s if t h e s y s t e m
d o e s not u n d e r s t a n d " e x t e n s i o n " in:
W h a t is the e x t e n s i o n for Jim Smith?
it is u n w i s e to r e p l y :
W h a t is an extension?
b u t b e t t e r to ask:
Do y o u w a n t Jim Smith's number or his address?*
Both
replies
indicate
the
problem
word
exactly,
and
are
thus
in
some
sense
equally
i n f o r m a t i v e about the nature of the problem in comprehension, but the s e c o n d is p r e f e r a b l e
s i n c e it e n c o u r a g e s the user to use either "number" or "address" in his r e p l y , and h e n c e g i v e
t h e s y s t e m a g o o d chance of understanding it.
W h i l e a g r a c e f u l l y interacting system should try to shape its users r e s p o n s e s , t h e f o r m of
t h e r e s p o n s e s cannot b e guaranteed.
In particular, the system cannot p o s e multiple c h o i c e
q u e s t i o n s to the user and expect the user always to pick one of the choices g i v e n , as
many
current
s y s t e m s including Codd's RENDEZVOUS system [6].
Multiple c h o i c e
questions
a r e c o n t r a r y to the w h o l e spirit of graceful interaction and sequences of them w o u l d
frustrate
a
alternatives
user.
A
gracefully
to the user
interacting
system
can
and
should
be
able
to
soon
present
as in the last example, but it must be p r e p a r e d f o r t h e u s e r
e x p r e s s his c h o i c e in his o w n w a y or to come up with an entirely different o p t i o n .
Do y o u w a n t Jim Smith's telephone number or his address?
I t is, of course, highly desirable that the system should note the reply to th.s question, and remember it if he
user ever says the word, extension, again, since repetition of such a question would surely be very fruttralmf
Ua user Single word learning of this sort has been discussed by Carboneil in connexion with his Politics system
[3] In general, the problems of learning from mistakes and adapting to the idiosyncr.cie. of individual users ere
aspects of graceful interaction we have chosen not to deal with.
!
do
to
11
E x a m p l e s of r e a s o n a b l e r e s p o n s e s include:
T h e number.
T h e first one.
Both.
I want
Do y o u
I want
I want
to p h o n e him.
k n o w w h e r e he is now?
his s o c i a l - s e c u r i t y number.
the number, but I meant Joe Smith.
T h i s r a i s e s the p o s s i b i l i t y that the user's clarification will also be misunderstood, a n d of
a
sequence
a
of
misunderstandings
and clarifications
failing
to c o n v e r g e .
In s u c h
cases,
g r a c e f u l l y i n t e r a c t i n g s y s t e m will have to become more and more explicit about the n a t u r e of
its p r o b l e m s and its limitations and r e l y on the user to accommodate himself (see S e c t i o n 5.6.
In s h o r t , a g r a c e f u l l y interacting system must be able to e x p r e s s its o w n
incomprehension
of o r u n c e r t a i n t y about what the user said, and be able to deal w i t h the user's r e p o r t s of his
own
problems
in
comprehending
the
system.
While
most
systems
can
indicate
i n c o m p r e h e n s i o n in some w a y , most are uninformative and undirective in the s e n s e s w e h a v e
discussed.
As
complaints
of
far
as w e
know, no current system can r e s p o n d
incomprehension
by its user.
appropriately
to
explicit
Clearly, formulating adequate r e p l i e s to
such
c o m p l a i n t s w o u l d r e q u i r e a system to keep track of the things it said and the r e a s o n s
for
s a y i n g them.
2.4. E c h o i n g and the Use of Fragmentary Recognition
O n e r e s t r i c t i o n a s s o c i a t e d w i t h explicit indications of incomprehension is the e x p e c t a t i o n o f
a reply.
In this s e c t i o n w e discuss a technique, called echoing, w h i c h the listener c a n u s e t o
c o n f i r m his i n t e r p r e t a t i o n of a message, but which does not require a r e p l y from the o r i g i n a l
s p e a k e r if the i n t e r p r e t a t i o n is correct; a reply is needed if the i n t e r p r e t a t i o n t u r n s o u t t o b e
incorrect.
E c h o i n g is thus an efficient way of confirming interpretations of w h i c h the l i s t e n e r
is f a i r l y s u r e , but is less useful for cases in which the listener
is less c e r t a i n .
We
also
m e n t i o n a n o t h e r w a y of reducing the number of explicit indications of i n c o m p r e h e n s i o n - b y
u s i n g f r a g m e n t a r y recognitions of utterances as the basis for complete i n t e r p r e t a t i o n s .
interpretations
can
never, of
course, be guaranteed
to be c o r r e c t , and t h e y
Such
should
be
c o n f i r m e d through echoing.
If
the
original
speaker
receives
a message
indicating that
his
message
has
not
c o m p l e t e l y or c o n f i d e n t l y understood, he will normally try to clarify or c o n f i r m his
message.
been
original
In p a r t i c u l a r , an explicit request for confirmation cannot be a n s w e r e d implicitly:
U: What is the number <?>ter Smith?
S: Did y o u s a y Walter Smith?
U: C a n y o u also g i v e me his address?
12
T h i s e x a m p l e is unnatural unless the first speaker says " y e s " b e f o r e asking f o r the a d d r e s s .
It
would
become
tedious
to answer
too many explicit
requests
for
confirmation, s o
f o r t u n a t e that human dialogue provides a method of asking for confirmation in a w a y
it
is
that
a l l o w s an affirmative r e s p o n s e to be given implicitly. When the listener w a n t s to b e s u r e that
h i s i n t e r p r e t a t i o n of the speaker's utterance (or more commonly part of it) is c o r r e c t , he h a s
o n l y t o e c h o his i n t e r p r e t a t i o n .
h e i m p l i c i t l y c o n f i r m s it.
If the original speaker does not comment o n the e c h o , t h e n
Thus:
U: W h a t is the number for <?>ter Smith?
S: W a l t e r Smith ...
S: His number is 5 5 9 2 .
U: 5 5 9 2 . Thank y o u .
S: Y o u ' r e w e l c o m e .
Of c o u r s e , t h e o r i g i n a l speaker still has the option of confirming the e c h o explicitly; a n d if t h e
e c h o is i n c o r r e c t , he must indicate that explicitly.
Direct
echoes
of
this
type
are
useful
because
they
can be
confirmed
implicitly,
thus
a l l o w i n g a l i s t e n e r to v e r i f y his understanding without disrupting the flow of the c o n v e r s a t i o n
b y a clarifying sub-dialogue.
Nevertheless, a direct echo is still somewhat d i s r u p t i v e b y
its
v e r y p r e s e n c e ; if it w e r e not for its verifying function, it would be completely s u p e r f l u o u s t o
the conversation.
small
amount
of
T h e r e is, however, another verifying technique w h i c h avoids e v e n
disruption.
The
technique, which
we
will
call
indirect
echoing,
this
is
to
i n c o r p o r a t e t h e assumption to be verified into the listener's next utterance in the normal f l o w
of the conversation.
For example:
U: What is the <GARBLE> for Jim Smith?
S: T h e number for Jim Smith is 2597.
In this e x a m p l e , S is able to determine from* the linguistic and non-linguistic
< G A R B L E > is almost certainly "number".
an indirect echo.
context
that
S then incorporates this assumption into his r e p l y as
T h e net effect is much the same as for a direct echo ("the number"); if U,
t h e o r i g i n a l s p e a k e r , d o e s not comment on this indirect echo it is implicitly c o n f i r m e d , a n d if
h e w i s h e s to c o r r e c t it explicitly, he may do so.
is
the
absence
of
any
utterance
by
S
The essential d i f f e r e n c e from a d i r e c t
outside
the
natural
flow
of
the
echo
conversation.
I n t e r e s t i n g l y e n o u g h , this seems to deny the original speaker the o p p o r t u n i t y to c o n f i r m t h e
assumption explicitly.
T h e f o r m of an e c h o need not always correspond to the form of the u t t e r a n c e w h i c h
e c h o e d , b u t c a n b e a p a r a p h r a s e of it.
is
The following examples s h o w p a r a p h r a s e d e c h o e s of
t h e d i r e c t a n d indirect t y p e s respectively.
13
U: I w a n t to contact Roger Smith.
S: T h e n u m b e r for Roger Smith ...
S: It's 2 5 9 7 .
U: I w a n t to <GARBLE> Jim Smith.
S: T h e n u m b e r for Jim Smith is 2957.
A n u m b e r of q u e s t i o n a n s w e r i n g systems, including RENDEZVOUS [6] and C O O P [ 2 3 , 2 8 ] , h a v e
adopted
the
policy
of
presenting
the
user
with
a paraphrased
g e n e r a t e d f r o m the system's internal representation of that input.
chance
to
find
and
correct
i n t e r p r e t a t i o n of his input.
considered
graceful
any
misinterpretations
made
by
version
of
each
input,
This g i v e s the u s e r
the
system
during
However, the unconditional generation of p a r a p h r a s e s c a n n o t
interaction, and quickly become tedious for the user.
Direct
its
be
echoes,
w h e t h e r p a r a p h r a s e s or not, should only be generated w h e n there is some significant
o f u n c e r t a i n t y in the interpretation.
a
degree
Selective paraphrase generation along t h e s e lines, w h i c h
o f c o u r s e i n v o l v e s deciding w h e n uncertainty exists, has b e e n attempted b y C a r b o n e l l in his
M I C S s y s t e m [3].
In a d d i t i o n to its use for checking the correctness of interpretations, e c h o i n g , e s p e c i a l l y
direct
echoing, can
also
serve
as a time filler.
In an analysis of p r o t o c o l s
of
directory
a s s i s t a n c e c o n v e r s a t i o n s , w e found that it was common to echo the name b e i n g a s k e d
while
t h e number w a s b e i n g found (which typically took several seconds).
T h i s is
for
perhaps
e x p l a i n a b l e b y a listener's need (discussed in Section 2.2) to r e p l y in o r d e r to c o n v i n c e
speaker
served
that
just
his u t t e r a n c e has b e e n received at all.
the same p u r p o s e
It appears that e c h o e s in this
as a phrase such as: Just
H
a second!", indicating
the
case
that
the
c o m m u n i c a t i o n had b e e n r e c e i v e d , but that there would be a delay b e f o r e the real r e p l y w a s
given.
It is i n t e r e s t i n g to note that a full echo of the input is rarely given in s u c h t i m e - f i l l i n g
e c h o e s ; In t h e p r o t o c o l s , p e o p l e e c h o e d only the name as o p p o s e d to any o t h e r p o r t i o n s of
t h e r e q u e s t f o r a number.
In doing this they are perhaps conforming to the c o n v e n t i o n s
t h e c l a r i f y i n g t y p e of echo, b y echoing the part of the input least p r e d i c t a b l e in the
of
given
c o n t e x t , a n d t h e r e f o r e most liable to misinterpretation.
In o r d e r
to conduct natural dialogues, and to avoid frustrating its user w i t h a s t r e a m of
t r i v i a l c o m p l a i n t s of incomprehension, a gracefully interacting system should c o n f o r m to
the
h u m a n c o n v e n t i o n s of echoing; i.e. it should be able to issue echoes (direct or i n d i r e c t ) w h e n
it is u n c e r t a i n in its comprehension, or w h e n appropriate as time-fillers, and s h o u l d b e a b l e t o
monitor
and
make
use
of
any corrections
its user
offers
in r e s p o n s e .
It must
also
be
c o n s t a n t l y o n the w a t c h for e c h o e s by the user of what it says, and should b e r e a d y t o i s s u e
c o r r e c t i o n s as n e c e s s a r y .
A s far as w e know, none of these components of e c h o i n g has e v e r
b e e n i m p l e m e n t e d in a dialogue system.
14
A w a y more radical than echoing of avoiding explicit requests for clarification is to i g n o r e
u n c o m p r e h e n d e d but inessential elements of what the user said.
to
tell
whether
uncomprehended.
try.
a
uncomprehended
input
is
essential
or
not,
simply
because
it
is
H o w e v e r , there are a number of heuristic strategies that the s y s t e m c a n
F i r s t , if the incomprehensible element is small enough and is contextually d e t e r m i n e d as
f u n c t i o n w o r d or
"noise phrase", it can be ignored.
It is o f t e n unimportant
d e t e r m i n e r is d e f i n i t e or indefinite or which preposition is used.
be
In g e n e r a l , it is i m p o s s i b l e
i g n o r e d t h r o u g h a k e y - w o r d approach.
users
are
likely
to
offer
explanations
whether
a
C e r t a i n larger s e g m e n t s c a n
For instance, as was found in w o r k o n G U S [ 2 ] ,
for
their
desires, so
that
if
any
unrecognizable
s e q u e n c e c a n b e f o u n d w h i c h begins with "because", "since", etc., it is not u n r e a s o n a b l e
to
i g n o r e that s e q u e n c e as an irrelevant explanation.
A m o r e g e n e r a l w a y of deciding whether or not a partial comprehension is s u f f i c i e n t is t o
d e c i d e w h e t h e r the r e c o g n i z e d components can be combined into something that t h e
w o u l d f i n d it r e a s o n a b l e for the user to say.
system
For example, if a travel agent s y s t e m c o u l d
e x t r a c t a c i t y and a date from what a user said to it, it could build a request for a r e s e r v a t i o n
t o that c i t y o n that date. T h e combination of recognized fragments is, in f3ct, the b a s i s of t h e
q u e s t i o n a n s w e r i n g ability of Waltz's PLANES system [40], and other w o r k o n the c o m b i n a t i o n
o f s u c h f r a g m e n t s has b e e n done b y Fox and Mostow [11].
N o n e o f the a b o v e methods of ignoring certain unrecognized elements of an input a r e c l o s e
to
being
foolproof;
comprehension.
they
can,
and
sometimes
will,
produce
fundamental
errors
F o r this reason it is important that a gracefully interacting s y s t e m
in
make
c l e a r w h a t assumptions it is making and what the results of its c o m p r e h e n s i o n r e a l l y a r e .
would
be extremely
tedious for the user if the system always made explicit
It
requests
for
c o n f i r m a t i o n of e v e r y t h i n g it was unsure of, particularly if the language c a p a b i l i t i e s of
the
s y s t e m w e r e less than complete as w e might expect for systems in the f o r s e e a b l e f u t u r e .
A
h e a v y u s e of d i r e c t e c h o e s is little better, since the user would scarcely feel that e x c e s s i v e l y
p a r r o t - l i k e b e h a v i o u r w a s v e r y graceful.
A more palatable alternative is f o r the s y s t e m t o
i n c o r p o r a t e t h e assumptions it makes into its reply through an indirect e c h o .
F o r e x a m p l e , if
ail a t r a v e l agent s y s t e m c o u l d extract from:
I am i n t e r e s t e d in p a y i n g a visit to Pittsburgh on or around M a y 17
w a s " P i t t s b u r g h " and "May 17", then its r e p l y could be:
A b o u t what time o n M a y 17 would y o u like to go to Pittsburgh?
T h i s i n d i r e c t e c h o , if u n c o r r e c t e d by the user, will confirm both the system's i n t e r p r e t a t i o n o f
t h e f r a g m e n t s , as w e l l as its assumptions about their relationship (it c o u l d h a v e b e e n
P i t t s b u r g h i n s t e a d of to Pittsburgh).
The notion of always making the system's
from
assumptions
15
e x p l i c i t is p r e s e n t in the w o r k of Codd [6]; however, Codd suggested that his s y s t e m s h o u l d
present
its u n d e r s t a n d i n g of each of the user's requests for explicit a p p r o v a l b y the
user
b e f o r e p r o c e e d i n g to satisfy the request.
2.5. Summary of Robust Communication
We
can
summarize
the
techniques
of
robust
communication, and their
implications
g r a c e f u l l y i n t e r a c t i n g systems as follows:
-
Human d i a l o g u e e n s u r e s accurate communication without the o v e r h e a d of explicit
a c k n o w l e d g e m e n t b y using the principle of implicit confirmation, w h i c h allows t h e
s p e a k e r to assume his message has been received c o r r e c t l y b y the listener,
u n l e s s the listener indicates otherwise.
-
A s p e a k e r e x p e c t s his listener to reply. If the listener does not r e p l y w i t h i n a
v e r y s h o r t time, the original speaker will assume his message has not got a c r o s s ,
a n d initiate a dialogue to check the channel of communication. From the point of
v i e w of the p r i n c i p l e of implicit confirmation, the absence of a r e p l y s e r v e s as a
tacit indication b y the listener that he has not r e c e i v e d the
speaker's
communication. A gracefully interacting system must be able to: c o n f o r m to the
c o n v e n t i o n of immediate reply, if necessary with a time-filling r e p l y ; initiate a
c h a n n e l - c h e c k i n g dialogue if its user does not reply; and participate in a
c h a n n e l - c h e c k i n g dialogue started by the user.
-
T h e simplest w a y for the listener to indicate that he has not r e c e i v e d the
s p e a k e r ' s message c o r r e c t l y is to ask him what he said or meant, while p o s s i b l y
at the same time indicating the nature and source of the p r o b l e m in
communication.
Such an indication requires a reply by the original s p e a k e r .
D i f f e r e n t w a y s of asking the question tend to constrain the r e p l y to a g r e a t e r o r
lesser extent.
T h e necessarily limited linguistic abilities of a dialogue s y s t e m
s u g g e s t that it s h o u l d ask the most constraining questions it can in s u c h
situations.
W h e n a listener thinks he has understood what the speaker said, but is not q u i t e
s u r e , he c a n e c h o (the tentative part) of what he thought he h e a r d , e i t h e r
d i r e c t l y in a s e p a r a t e utterance, or indirectly by incorporating the e c h o into his
n e x t u t t e r a n c e in the normal flow of the conversation. If the original s p e a k e r
d o e s not c o r r e c t the echo, it is implicitly confirmed. Echoing thus allows the
l i s t e n e r to c o n f i r m his interpretation, without requiring a r e p l y from the s p e a k e r .
A d i a l o g u e s y s t e m can avoid many trivial questions by using this technique, a n d
b e i n g r e a d y to accept any corrections that are given. In any case, a s y s t e m
must monitor its user's echoes of what it says, and be p r e p a r e d to c o r r e c t a n y
mistakes.
-
O t h e r explicit indications of incomprehension, with their requirements for r e p l i e s
c a n b e a v o i d e d if the system guesses what the user said on the basis of p a r t i a l
o r f r a g m e n t a r y recognition of the input. Assumptions made in this w a y must b e
e c h o e d to a v o i d c o n f u s i o n .
for
16
3. Flexible Parsing
T h e l a n g u a g e u s e d in naturally occurring dialogues b e t w e e n humans d e v i a t e s in many w a y s
f r o m c o m m o n l y a c c e p t e d standards of grammatically.
There may be i n c o r r e c t
prepositions,
i n f l e x i o n s , o r lack of agreement; utterances may be elliptical, fragmentary, o r idiomatic; t h e r e
m a y b e o m i t t e d or r e p e a t e d w o r d s or phrases; utterances may be b r o k e n off and r e s t a r t e d at
any
point.
sufficiently
These
deviations
occur
most
frequently
common in t y p e d conversations
in
spoken
that any computer
dialogues,
but
system w h i c h
are
also
attempts
to
e n g a g e humans in g r a c e f u l natural language dialogue of the t y p e d or s p o k e n v a r i e t y must b e
a b l e t o d e a l w i t h them.
U n f o r t u n a t e l y , most c u r r e n t parsing methodologies are not well suited to the a n a l y s i s o f
such
deviant
input.
Most
systems
analyse
linguistic
input
in
accordance
with
their
e x p e c t a t i o n s , grammatical and otherwise, which cannot practically be e x t e n d e d to c o v e r all, o r
e v e n most, of the p o s s i b l e deviations; input failing to conform to these e x p e c t a t i o n s , b y
s o m u c h as a single w o r d is typically totally incomprehensible to such systems.
even
In s h o r t , most
c u r r e n t p a r s i n g s y s t e m s are quite inflexible in the face of deviant input.
B y f l e x i b l e parsing w e mean parsing which can deal with input deviating f r o m grammatical
norms
and/or
the
grammatical expectations of
the system doing the
parsing.
Given
the
c o m m o n o c c u r r e n c e of grammatical deviations in naturally occuring dialogues, f l e x i b l e p a r s i n g
is
clearly
of
prime
importance
for
graceful interaction.
In the
following
subsection
we
d e s c r i b e v a r i o u s w a y s in w h i c h language in natural dialogues can and d o e s d e v i a t e f r o m t h e
g r a m m a t i c a l norm; in a s e c o n d subsection, we consider current parsing techniques, t h e w a y in
w h i c h t h e y h a v e b e e n u s e d to deal (or fail to deal) with these deviations, and their p o t e n t i a l
in t h i s r e g a r d .
3.1. Grammatical Deviations
In this s e c t i o n w e d e s c r i b e some of the most common t y p e s of deviations f r o m
grammar
that
arise
in
natural
human
conversations:
idioms,
fragmentary
standard
utterances,
o m i s s i o n s , r e p e t i t i o n s , insertion of noise phrases, small grammatical e r r o r s , and e l l i p s i s .
Idioms: E v e r y d a y language contains many idioms, i.e. phrases w h o s e i n t e r p r e t a t i o n c a n n o t
b e o b t a i n e d b y using the components of the phrase in the usual w a y ("a w i l d g o o s e c h a s e "
h a s v e r y little to d o w i t h geese).
Even though the structure of idioms is o f t e n grammatical, it
is u n h e l p f u l to p a r s e them, since b y definition their meaning cannot b e o b t a i n e d f r o m t h e i r
c o m p o n e n t s ; it is n e c e s s a r y to interpret them as a whole.
17
M a n y o t h e r p h r a s e s while not strictly speaking idiomatic can o f t e n c o n v e n i e n t l y be d e a l t
w i t h as a w h o l e .
It is possible, for instance, to analyse:
C o u l d y o u g i v e me the number for Joe Smith
as a c o n d i t i o n a l q u e s t i o n about ability, and from that to recognize it as an indirect s p e e c h act
m a k i n g a r e q u e s t f o r information.
probably
On the other hand, a d i r e c t o r y assistance s y s t e m
would
f i n d it more convenient to interpret "could y o u give me" d i r e c t l y as a r e q u e s t
for
information.
Fragmentary utterances: Humans are adept at piecing together fragments of an u t t e r a n c e t o
c o m e u p w i t h an i n t e r p r e t a t i o n of the utterance as a whole.
fragmentary
recognition
occurs
mainly
when
noise
of
For p e o p l e , the n e e d f o r
some
sort
or
lack
of
such
clarity
in
p r o n u n c i a t i o n has made it impossible for them to hear all of an utterance, or w h e n a s p e a k e r
has
actually
frequent
spoken
in disjointed
fragments.
In the case
of computer
systems, a
more
n e e d f o r fragmentary recognition might arise through ignorance of v o c a b u l a r y
constructions.
or
In e i t h e r case, dealing with such fragmentary input, as humans s e e m able to
do, r e q u i r e s an ability to extract the largest recognizable fragments from the input, and to
use t h o s e f r a g m e n t s to construct a reasonable interpretation.
In many
fragments
c a s e s , if it is possible
at
all, t h e n
that
to construct
interpretation
will
a reasonable
be
the correct
i n t e r p r e t a t i o n of
one.
For
recognized
example, if
the
f r a g m e n t s " g i v e m e " and "the number for Joe Smith" w e r e recognized out of:
W o u l d y o u b e so kind as to give me your listing of the number* for J o e Smith
t h e n t h e o b v i o u s i n t e r p r e t a t i o n would be the right one.
On the other hand, fragments can be
p i e c e d t o g e t h e r i n c o r r e c t l y , as in the same limited understanding of:
I a s k e d y o u to g i v e me the number for Joe Smith, but I meant F r e d .
Thus,
allowing
portion
of
fragmentary
an u t t e r a n c e
recognition raises the question of
w h e n the
can be safely ignored, and w h e n it cannot.
uncomprehended
There
is s u r e l y
no
s o l u t i o n t o this p r o b l e m that is guaranteed always to be correct, but as d i s c u s s e d in S e c t i o n
2.4, t h e t e c h n i q u e s of robust communication are helpful w h e n an i n t e r p r e t a t i o n built
from
f r a g m e n t s t u r n s out to b e w r o n g .
Omissions, repetitions, and noise phrases: It is possible to omit and r e p e a t w o r d s o r
phrases
o r i n s e r t n o i s e p h r a s e s into an utterance without significantly decreasing its intelligibility as
in:
W h a t the the number for lemme see Joe Smith?
Of c o u r s e , f u n c t i o n w o r d s have the least effect, but in the proper context content w o r d s or
IS
p h r a s e s may b e omitted or r e p e a t e d without loss of comprehension.
Omission and r e p e t i t i o n
is m u c h m o r e of a p r o b l e m in s p e e c h input than in typed input (though it d o e s o c c u r in t h e
latter).
parts
T h e n o i s i n e s s of a s p e e c h signal can often effectively cause omission of w o r d s
of
w o r d s , (particularly small function words); while repetition commonly o c c u r s
or
as
p e r s o n r e p e a t s (or w o r s e , replaces) words, or sequences of w o r d s as he is p r o d u c i n g
a
an
u t t e r a n c e as in:
W h a t is e r c o u l d er y o u g...give me the number er the extension for J o e Smith?
This
written
representation
does
not
indicate
the
prosodic
features
-
rising
i n t o n a t i o n s , p a u s e s , and stress - present in such utterances; these f e a t u r e s
and
falling
may b e
very
i m p o r t a n t in human comprehension of utterances containing such repetition, b r e a k i n g - o f f , a n d
r e s t a r t i n g , as w e l l as in comprehension of more fluent utterances.
Grammatical
Errors:
It
is
not
uncommon
through
carelessness
or
ignorance,
m i s t a k e s in t h e use of language without materially affecting comprehensibility.
to
make
Examples
are
i n c o r r e c t t e n s e s , lack of agreement b e t w e e n subject and v e r b , using the w r o n g p r e p o s i t i o n ,
and, for
typed
input, misspelling.
Humans appear to be remarkably
unaffected
by
these
e r r o r s in t h e i r language understanding, often not e v e n noticing that e r r o r s have b e e n made.
Ellipsis: It is not uncommon in dialogue to say things that w o u l d be meaningless o u t s i d e t h e
c o n t e x t o f the d i a l o g u e :
U: What is the number for Mr. Smith?
S:
D o y o u mean J o e Smith or F r e d Smith?
U:
Joe
In t h i s i n t e r c h a n g e , " J o e " is said to be an ellipsis of the more complete "I mean J o e S m i t h " .
N e e d l e s s to s a y , humans are seldom confused by this such omissions, and are able t o u s e t h e
linguistic
a n d non-linguistic context to understand the elliptical utterance w i t h o u t
difficulty.
T h e r e is no c l e a r indication whether human resolution of ellipsis involves the c o n s t r u c t i o n o f
a m o r e c o m p l e t e linguistic form.
3.2. Implications for Language Analysers
H a v i n g p r e s e n t e d some of the w a y s in which naturally used language c a n d e v i a t e
g r a m m a t i c a l norms, w e t u r n to the question of how current parsing techniques c a n c o p e
s u c h d e v i a t i o n s , b o t h in practice and in theory.
including
traditional
syntactic
and
degrees
There
of
top-down
semantic
left-to-right
v a r i e t y , conceptual
We will consider a w h o l e range of
parsers
using
fixed
grammars
parsers, pattern-matching
sophistication, and parsers developed
for
use specifically
of
parsers
with
with
parsers,
both
of
to
the
various
spoken
is no intention, h o w e v e r , to s u r v e y the field of parsing, so r e f e r e n c e s
from
input.
existing
19
s y s t e m s w i l l b e c u r s o r y , and will assume prior knowledge on the part of the r e a d e r .
The
on
most common p a r s i n g technique in use today is top-down left-to-right
a
fixed
grammar.
This
technique
is usually
implemented
by
parsing
an augmented
based
transition
n e t w o r k ( A T N ) of the t y p e d e v e l o p e d by Woods [45], or some close variation; s u c h p a r s e r s
c a n p a r s e in t e r m s of either general grammatical categories as in the LUNAR s y s t e m [ 4 6 ] , o r
i n t e r m s of c a t e g o r i e s having some significance within a restricted domain of d i s c o u r s e as in
t h e L A D D E R s y s t e m [33].
P a r s e r s of this t y p e are extremely fragile; they typically fail to p r o d u c e any s o r t of
if
their
input
obviously
deviates
from their grammar
by so much as a single w o r d .
parse
This
property
makes them extremely unsuitable for dealing with input w i t h missing o r
repeated
w o r d s , grammatical e r r o r s , or w i t h fragmentary input.
These
parsers
left-to-right
because
are
so
algorithms
fragile
explore
because of
the
set
of
the depth-first
potential
w a y in w h i c h their
parses.
When
a partial
top-down
parse
fails
t h e next w o r d d o e s not fall into one of the categories allowed b y the grammar
at
t h a t p o i n t , t h e f a i l u r e is taken to mean that the parser made an incorrect c h o i c e e a r l i e r ; that
p a r s e p a t h is a b a n d o n e d ; and a different choice is made at an earlier choice point.
unless
Clearly,
t h e input c o n f o r m s exactly to one of the possibilities allowed b y the grammar,
parse
at all w i l l result.
no
This fail and back-up procedure is so fundamental to t h e w a y
in
w h i c h s u c h p a r s e r s s e a r c h their o f t e n v e r y large space of possibilities that it is v e r y d i f f i c u l t
t o m o d i f y t h e i r algorithms to deal with repeated words, incorrect endings, etc..
Weischedel
a n d B l a c k [ 4 1 ] have made some p r o g r e s s in this direction, by arranging to relax
predicates
e n f o r c i n g s u c h grammatical constraints as noun-verb agreement w h e n e v e r a s t r a i g h t f o r w a r d
parse
fails, but
these
tactics can deal with only
a relatively
limited class of
grammatical
deviations.
An
approach
fragmentary
capable
of
particular
which
avoids
recognition
recognizing
much
involves
the
of
use
a description of
t y p e of c o n s t r u c t i o n .
this
fragility
of
one
and
a number
particular
opens
of
up
specialist
type
of
entity
the
possibility
subgrammars,
or
analysing
of
each
one
While each subgrammar is applied in the same t o p - d o w n ,
l e f t - t o - r i g h t f a s h i o n as b e f o r e , they are applied independently, so that failure of o n e t o f i n d a
p a r s e d o e s not affect the performance of the others.
his
PLANES
[ 4 0 ] system, w h i c h has a subgrammar
This is the a p p r o a c h taken b y W a l t z in
for each t y p e of entity
known to
the
s y s t e m ; if an input contains r e f e r e n c e s to several different entities k n o w n to the s y s t e m , t h e n
e a c h of t h e m is a n a l y s e d quite independently of the others.
W h i l e s p l i t t i n g u p the recognition in this w a y ensures that a single r e p e a t e d o r
word
or
grammatical
mistake
will
not
wreck
the
entire
parse,
the
parsing
omitted
of
each
20
subcomponent
problems
of
still
suffers
deciding
from
the
same
fragility
as
b e f o r e , and
there
are
the
extra
at w h i c h point in the input to apply each subgrammar, and of
s u b g r a m m a r a n a l y s i n g input w h i c h should have been analysed by a different o n e .
one
In a d d i t i o n ,
w h i l e it is d e s i r a b l e to be able to parse fragments if the need arises, it is also d e s i r a b l e t o
p a r s e u t t e r a n c e s as a w h o l e w h e n e v e r possible; if an utterance is always p a r s e d as a s e t of
fragments,
it
is
always
necessary
i n t e r p r e t a t i o n s of the fragments.
to
construct
a
complete
interpretation
In the case of Waltz's system, the domain w a s
out
of
the
sufficiently
c o n s t r a i n e d that the interpretation of a set of fragments w a s always unique, but in g e n e r a l
t h i s w i l l , of c o u r s e , not b e true.
More
possibility
for
flexibility in this area of missing and extra w o r d s and
grammatical
e r r o r s s e e m s to be o f f e r e d b y the conceptual parser of Riesbeck [32]. Since a p a r s e b y that
s y s t e m is o r g a n i z e d a r o u n d and directed b y the meaning of the key action or s t a t e in t h e
s e n t e n c e b e i n g p a r s e d , the parse could presumably be made less sensitive to the p r e s e n c e of
the correct function words.
greatly
In practice, this potential flexibility does not seem t o h a v e b e e n
e x p l o i t e d , and the system still relies heavily o n finding, for example, the
correct
p r e p o s i t i o n s in p r e p o s i t i o n a l phrases.
A q u i t e d i f f e r e n t s t y l e of analysis is based on pattern-matching
or r e w r i t e r u l e s .
In t h e
s i m p l e s t t y p e of pattern-matching approach, interpretation of input is o b t a i n e d b y m a t c h i n g it
as a w h o l e against a set of patterns of words.
was
[31].
This was essentially the w a y in w h i c h i n p u t
a n a l y s e d b y many e a r l y AI language processing programs, such as ELIZA [ 4 2 ] a n d SIR
E a r l y v e r s i o n s of P A R R Y [8] used an approach barely more sophisticated t h a n t h i s , a n d
p a t t e r n matching (of s t r u c t u r e s more complex than words) was the basis of Wilks
machine
t r a n s l a t i o n s y s t e m [43].
Flexibility
process.
in
Many
pattern-matching
of
the simpler
parsing
can be
obtained
by
flexibility
pattern matching systems used variables
in
the
in their
matching
patterns
w h i c h c o u l d match a r b i t r a r y strings, but while this resulted in systems w h i c h c o u l d a n a l y s e a
very
wide
range
of
input,
the
level
of
analysis
was
correspondingly
shallow.
More
s o p h i s t i c a t e d forms of flexibility could presumably be obtained b y allowing partial matches of
p a t t e r n s , w i t h p e r h a p s restrictions to ensure that some content w o r d s w e r e i n c l u d e d in t h e
m a t c h ; it is not h a r d to s e e how this could deal with repeated and omitted w o r d s , a n d in s o m e
c a s e s grammatical e r r o r s .
Surprisingly enough, this potential has not b e e n w i d e l y e x p l o i t e d ;
T h e m o r e a d v a n c e d v e r s i o n of PARRY [30], for instance, seems to r e l y o n exact matching of
v e r y g e n e r a l p a t t e r n s , rather than inexact matching of specific patterns to deal w i t h d e v i a n t
input.
Nevertheless,
pattern-matching
parsing seems to o f f e r
the greatest
d e a l i n g w i t h grammatical mistakes, and missing and repeated w o r d s .
potential
for
21
A f u r t h e r a d v a n t a g e of p a t t e r n matching for the purposes of flexible p a r s i n g is its a b i l i t y
t o h a n d l e idioms
definition
and f i x e d phrases.
unhelpful
in
their
In fact, since the structure of idiomatic p h r a s e s is
interpretation,
some
sort
of
wholistic
p a t t e r n - m a t c h i n g seems almost obligatory for their interpretation.
traditional
parsers
including
the
one
for
the
LUNAR
approach
by
such
as
Indeed, a n u m b e r of m o r e
system
[46]
have
a
distinct
p r e - p r o c e s s i n g s t a g e in w h i c h idioms are recognized by a pattern-matching p r o c e s s .
Pattern
matching d o e s , h o w e v e r , have its limitations.
utterances
by
corresponding
regularities
single
t o the
patterns,
there
will
be
If one tries to a n a l y s e
commonalities
between
the
regularities of expression in the domain of d i s c o u r s e .
that the more usual t y p e of grammar is designed to account for.
It
complete
patterns
is
The
these
solution
w i t h i n a p a t t e r n - m a t c h i n g system is to introduce r e w r i t e rules, substituting t h e r e s u l t s
p l a c e o f w h a t is matched.
in
In this way, patterns which account for regularities in t h e u s e o f
a u x i l i a r y v e r b s c a n b e combined w i t h patterns which recognize specific idioms t o p r o d u c e
language
a
a n a l y s e r , b a s e d o n pattern-matching, but capable of c o p i n g w i t h r e g u l a r i t i e s i n a
non-redundant way.
T h e p o w e r of this approach has b e e n s h o w n in more r e c e n t w o r k
on
P A R R Y [30].
T h i s u n c e r t a i n t y about w h e r e fragments begin and end makes any t o p - d o w n a p p r o a c h w i t h
e i t h e r a l e f t - t o - r i g h t o r r i g h t - t o - l e f t directionality inappropriate for f r a g m e n t a r y r e c o g n i t i o n .
A
b o t t o m - u p a p p r o a c h is indicated.
Again, a hierarchical pattern-matching a p p r o a c h
s e e m w e l l s u i t e d o n t h e s e grounds, but it has not b e e n used in practice this w a y .
the
special
uncertainties
HEARSAY-II
[18]
problem
a p p l y i n g strict
by
robustness
and
of
achieved
HWIM
by
i n c r e a s e d search effort.
role
to
play
since
their
medium, several speech parsers, including
[47] systems, have
dealt
w i t h the
grammars in a bottom-up non-directional
this
technique
is, however, bought
of
the
recognition
fashion.
at the p r i c e of
Because of
those
fragmentary
would
The
extra
considerably
T h e s e systems also suggest that t o p - d o w n r e c o g n i t i o n still h a s
the e x p a n s i o n of
already recognized fragments in a c c o r d a n c e w i t h
a
the
g r a m m a r c r e a t e s h y p o t h e s e s about what exists o n either side of the fragments, a n d if t h e s e
h y p o t h e s e s c a n b e v a l i d a t e d in a t o p - d o w n manner, efficiency is g r e a t l y e n h a n c e d .
O n c e f r a g m e n t s have b e e n recognized, it is necessary to construct an interpretation
the fragments.
from
In the P L A N E S system, it is assumed that there is a l w a y s e x a c t l y o n e w a y o f
c o m b i n i n g t h e i n t e r p r e t a t i o n of the fragments to produce a meaningful i n t e r p r e t a t i o n ; t h i s
s t r a t e g y w o r k s b e c a u s e of a highly limited domain of discourse. Other, more g e n e r a l , w o r k o n
f r a g m e n t c o m b i n a t i o n b y Fox and M o s t o w [11] deals also w i t h situations in w h i c h c o n f l i c t i n g
f r a g m e n t s h a v e b e e n found.
Most
parsers
d o not d e a l w i t h ellipsis, the omission from an utterance
of
details
that
22
context
c a n p r o v i d e (see also Section 6.3).
The problem is tackled in a limited w a y
for
p a r s e r s built b y LIFER [20], A n y input not recognized in the normal w a y is assumed t o b e a n
e l l i p s i s of a s e n t e n c e structurally analogous to the last previously r e c o g n i z e d \nput
and an
f
attempt
is made t o p a r s e the input according to each of the possible s t r u c t u r a l
analogies.
T h e p a r s e r of the GUS s y s t e m [2] deals with ellipsis in r e p l y t o a question (e.g. Q: " W h e n d o
you
want
t o g o ? " A: " t e n " , instead of A: "I want to go at ten"), b y i n s e r t i n g an o t h e r w i s e
u n r e c o g n i z a b l e input into a template generated from the question asked, ("I want t o g o at ..."
f o r " W h e n d o y o u w a n t to go?"), and then parsing the filled-out template in t h e normal w a y .
N e i t h e r mechanism c a n c o p e w i t h more general forms of ellipsis.
T h i s g e n e r a l a p p r o a c h to ellipsis is based o n the assumption that the " c o m p l e t e " f o r m o f
t h e e l l i p t i c a l input
must b e d i s c o v e r e d b e f o r e the input can b e analysed.
a p p r o a c h is t o t r e a t elliptical inputs as complete within their context.
An
alternative
In p r a c t i c a l t e r m s t h i s
m e a n s p r e d i c t i n g p o s s i b l e ellipses according to the context, and attempting t o p a r s e
d i r e c t l y f r o m t h e input.
Such a direct approach avoids the possibly u n n e c e s s a r y c o m p l i c a t i o n
o f e n c l o s i n g t h e (presumably) important part
h a v e t o u n w r a p it again in the analysis.
of an input in a "completing" c o n t e x t o n l y
be
predicted
easily
a l t e r n a t i v e approach^
enough
for
to
Carbonell [3] has attempted to deal w i t h e l l i p s i s i n
e s s e n t i a l l y this w a y in the context of question answering systems.
can
them
it to be efficient
to
Of c o u r s e , not all e l l i p s e s
recognize
them
this
way;
an
u n t r i e d as far as w e know, might b e to deal w i t h s u c h e l l i p s e s i n t h e
s a m e w a y as fragmentary
input.
3.3. Summary of Flexible Parsing
T h e l a n g u a g e u s e d in naturally
occuring dialogues o f t e n deviates from strict
n o r m s ; c o m m o n t y p e s of deviation include: omitted and repeated
words, incorrect
i n f l e x i o n s and p r e p o s i t i o n s , idiomatic, elliptical, and fragmentary utterances.
s y s t e m s t y p i c a l l y d o not deal w i t h these t y p e s of deviant input.
apply
a strict
grammar
grammar
to their
input
grammatical
Current
tenses,
parsing
In the case of p a r s e r s w h i c h
in a t o p - d o w n left-to-right
fashion, w h e t h e r
that
is semantically b a s e d or purely syntactic, there seem to b e fundamental d i f f i c u l t i e s
i n v o l v e d i n d e a l i n g w i t h Input
which does not conform to the grammar.
The majority
of
a t t a c k s o n d e v i a n t input seem to have taken place in the context of p a t t e r n - m a t c h i n g p a r s e r s ,
a n d a l t h o u g h this t y p e of parser
is less suited to the regularities of language t h a n a m o r e
t r a d i t i o n a l t o p - d o w n parser^ there
is reason to hope that these difficulties c a n b e o v e r c o m e .
Pattern-matching
themselves
to
parsers
the
kind
are v e r y well suited to deal w i t h idiomatic input, a n d also
lend
of
with
bottom-up
analysis
that
appears
necessary
to
deal
23
l a n g u a g e c a n b e itemized as follows:
-
S t r i c t t o p - d o w n l e f t - t o - r i g h t parsing schemes, such as those b a s e d o n A T N s ,
a p p e a r i r r e d e e m a b l y unsuited to deal with small grammatical deviations s u c h as
r e p e a t e d o r omitted w o r d s , incorrect inflexions, etc..
-
T h e s e p r o b l e m s can b e localized, but not overcome t h r o u g h the use of
s p e c i a l i s t subgrammars.
-
T h e basic a p p r o a c h of conceptual parsers - fitting an input into a p r e d e f i n e d
c o n c e p t u a l f r a m e w o r k suggested by the main action or state of an input - s e e m s
p o t e n t i a l l y much b e t t e r able to deal with grammatical irregularities, but has not
y e t b e e n e x p l o i t e d in this way.
-
P a t t e r n - m a t c h i n g p a r s e r s are in a similar situation; they have the p o t e n t i a l t o
d e a l w i t h grammatical irregularities through flexible partial matching schemes, b u t
t h i s p o t e n t i a l has b e e n little exploited.
-
Idioms and
approach.
-
F r a g m e n t a r y r e c o g n i t i o n requires a basically bottom-up approach.
-
S p e e c h - p a r s e r s demonstrate the advantages in efficiency a f f o r d e d b y a l l o w i n g
t o p - d o w n s t r a t e g i e s to be used after a context has b e e n e s t a b l i s h e d b y
b o t t o m - u p methods.
-
O n l y limited forms of ellipsis are handled by current parsing systems; all c u r r e n t
m e t h o d s r e l y o n filling in the ellipsed components, and parsing the c o m p l e t e d
i n p u t ; a little e x p l o r e d alternative strategy is to recognize ellipses as c o m p l e t e
i n p u t s , e i t h e r b y p r e d i c t i o n or in the same way as other fragments.
other
fixed
phrases
are
best
handled
by
a
small
pattern-matching
4. Domain Knowledge
In this s e c t i o n w e discuss the importance to a gracefully interacting s y s t e m of
about
its d o m a i n of
discourse; consider
knowledge
a class of domains and r e l a t e d tasks that
appear
e s p e c i a l l y a p p r o p r i a t e for gracefully interacting systems, and that will f i g u r e h e a v i l y in o u r
discussions
m e t h o d of
of
the
remaining
components
of
graceful
interaction; and
finally, d e s c r i b e
a
k n o w l e d g e r e p r e s e n t a t i o n appropriate for gracefully interacting s y s t e m s in s u c h
domains.
4.1. The Role of Domain Knowledge
A
computer
system
cannot
interact
gracefully
with
its
k n o w l e d g e about the domain that the interaction concerns.
user
unless
it
has
substantial
Such domain k n o w l e d g e is not,
h o w e v e r , a c l e a r l y s e p a r a b l e component of graceful interaction like those w e have d i s c u s s e d ;
r a t h e r , it is a p r e r e q u i s i t e or underpinning for the other components.
A parser, for instance,
24
requires
k n o w l e d g e of h o w w o r d s and the phrasings in which t h e y are u s e d r e l a t e t o
u n d e r l y i n g domain; and for the purposes of robust communication, it is important t o
the
know
w h a t a r e t h e important, information-bearing parts of a user's utterance, and w h i c h p a r t s c a n
be
safely
ignored, even
if
not
fully
understood; an explanation facility
clearly
requires
k n o w l e d g e of the abilities and mode of operation of the system itself.
For
t h e t w o c o m p o n e n t s of graceful interaction already discussed:
and flexible
parsing, we
were
robust
communication
able to describe the component in a more or
less
domain
i n d e p e n d e n t w a y . E v e n though examples w e r e necessarily r e s t r i c t e d in domain, it is not h a r d
t o s e e h o w the p r i n c i p l e s and requirements w e established apply to v i r t u a l l y a n y d o m a i n .
N o r d i d o u r d i s c u s s i o n d e p e n d at all o n the w a y in which the requisite domain k n o w l e d g e w a s
r e p r e s e n t e d ; w e w e r e able simply to assume that it was available.
N e i t h e r of t h e s e simplifying assumptions holds for the four components d i s c u s s e d in t h e
f o l l o w i n g four sections:
explanation facility, goals and focus mechanisms, i d e n t i f i c a t i o n
d e s c r i p t i o n s , and language generation.
from
While the components are relevant to and i m p o r t a n t
f o r a n e q u a l l y w i d e - r a n g e of domain as the preceding components, t h e y will b e
d i f f e r e n t a c c o r d i n g to the particular t y p e of domain involved.
significantly
In discussing them, w e w i l l t r y
t o g i v e a p e r s p e c t i v e o n them for a range of domains; but w e will c o n c e n t r a t e o u r
attention
o n o n e (quite b r o a d ) class of domains that w e will call simple-service domains, a n d w h i c h
i n c l u d e b o t h the d i r e c t o r y assistance and restaurant reservations domains that w e h a v e b e e n
following.
In t h e
following
two
subsections, w e will describe
this class of domains
a little
more
c a r e f u l l y , a n d c o n s i d e r a p p r o p r i a t e representations for knowledge about s u c h domains.
4.2. Simple Services
M a n y of t h e simple s e r v i c e s p r o v i d e d in current society, especially those o f f e r e d o v e r t h e
t e l e p h o n e , r e q u i r e in e s s e n c e only that the customer or client identify c e r t a i n e n t i t i e s t o t h e
p e r s o n p r o v i d i n g the service; these entities are parameters of the s e r v i c e , and o n c e t h e y a r e
i d e n t i f i e d t h e s e r v i c e can be p r o v i d e d . We will call tasks which can b e cast in this f r a m e w o r k
simple services.
Examples of such services are directory assistance (the p e r s o n , b u s i n e s s o r
o r g a n i z a t i o n w h o s e number is d e s i r e d is the entity that must be identified) and
restaurant
r e s e r v a t i o n s ( p a r t y s i z e , time, date must be identified); in fact, airline and any o t h e r t y p e of
r e s e r v a t i o n falls into this mould, along with requests for weather information, c u r r e n t
m a r k e t p r i c e s , and any other "current value" requests for information.
stock
A n additional s i m p l e
s e r v i c e , w h i c h f o r o b v i o u s reasons is not available from humans, is p r o v i d e d b y an e l e c t r o n i c
mail
s y s t e m ; the
p a r a m e t e r s for sending an item of electronic mail, f o r instance, a r e
the
25
s o u r c e a n d d e s t i n a t i o n , plus the (unanalysed) body of the message.
Examples
of
non-simple
service
domains
for
gracefully
interacting
systems
include
a s s e m b l y tasks as s t u d i e d by a g r o u p at Stanford Research Institute [39] in w h i c h an e x p e r t
supervises
a
novice
assembling
a mechanical
device
(the
obvious
role
for
a
i n t e r a c t i n g s y s t e m h e r e is the expert), or more generally any s u p e r v i s o r y or
task.
gracefully
instructional
A c o m p l e t e s e c r e t a r i a l service would also be b e y o n d the realm of simple s e r v i c e s .
W e h a v e s i n g l e d out simple service domains for two reasons:
tractability.
common
It
in
the
is
clear
real
from
the
w o r l d , so
above examples
their commonness, and t h e i r
that simple
a solution to the graceful
service
interaction
systems
problem
are
very
which
was
r e s t r i c t e d to simple s e r v i c e domains would still be extremely useful. F u r t h e r m o r e , w e b e l i e v e
that
the
simple
graceful
service
interaction
domains.
that
We
interaction problem is tractable in a relatively
systems.
we
still
are
claim
straightforward
way
In particular, w e believe that the set of components of
proposing
that
our
is
sufficient
for
systems
p r o p o s e d components
operating
are n e c e s s a r y
in
graceful
simple
for
all
i n t e r a c t i n g s y s t e m s , w h e t h e r operating in simple service domains or not, but other
for
service
gracefully
non-simple
s e r v i c e d o m a i n s may r e q u i r e extra skills.
4.3. R e p r e s e n t a t i o n s for Simple Service Domains
N o w w e t u r n to the question of how to represent knowledge about simple s e r v i c e d o m a i n s .
The
use
important
of
conversational
examples
are
the
systems
GUS
in simply
system
service
[2], which
domains
made
is
far
round-trip
from
plane
novel.
Two
reservations
b e t w e e n p a i r s of cities in California, and the PAL system [37], w h i c h s c h e d u l e d a p p o i n t m e n t s
from specifications
of participants, meeting place, time, e t c . * Both systems u s e d f r a m e s
to
o r g a n i z e and r e p r e s e n t their domain knowledge.
F r a m e s a r e a method of knowledge representation, named and p o p u l a r i z e d b y M i n s k y [ 2 9 ] ,
w h i c h s e e m p a r t i c u l a r l y w e l l suited to simple service domains because their s t r u c t u r e
t h e n a t u r a l s t r u c t u r e of a simple service:
reflects
a service specified through the d e s c r i p t i o n of
a
l i m i t e d s e t of e n t i t i e s w h i c h s e r v e as parameters to the service.
A
f r a m e is a r e p r e s e n t a t i o n of some entity in terms of the entities w h i c h make it up; a
p h y s i c a l o b j e c t c a n b e r e p r e s e n t e d in terms of its parts, an event can be r e p r e s e n t e d
terms
of
its
p a r t i c i p a n t s , and a more complex event can be r e p r e s e n t e d
in terms
of
in
its
Actually, PAL is not an interactive system. It accepts a monologue which specifies all the information necessary to
schedule a meeting. Nevertheless, its domain is simple service and many of the same problems, particularly in the area of
focus, arise as for a gracefully interacting system.
26
sub-events.
service
T h e components of a frame are called its slots.
systems
The use of frames in s i m p l e
is thus straightforward; the entities which the user must d e s c r i b e
s y s t e m c o r r e s p o n d to slots in a frame which represents the service as a w h o l e .
performs
to
the
The system
the s e r v i c e b y filling the slots according to the user's descriptions, plus
possibly
p e r f o r m i n g some manipulations on the completed frame.
The
u s e of frames p r o v i d e s a number of clear-cut advantages for g r a c e f u l i n t e r a c t i o n in
s i m p l e s e r v i c e domains.
First, a frame is a declarative data structure, so that is e a s y f o r
a
s y s t e m t o manipulate its components in whatever order and h o w e v e r o f t e n it d e s i r e s ; t h i s
a l l o w s a simple s e r v i c e system to deal with a user's descriptions of slots in w h a t e v e r
order
t h e y a r e p r e s e n t e d , and to go back and change the entity filling a slot if the user c h a n g e s his
mind.
F u r t h e r m o r e , the declarative nature of a frame allows the system easily t o k e e p t r a c k
o f w h a t has b e e n accomplished so far in a conversation, and what still has to b e d o n e .
If t h e
u s e r fails to v o l u n t e e r a description for any required slot, the resulting hole in the f r a m e is a
signal
for
the
system
a p p r o p r i a t e entity.
to
take
the
initiative
and ask
the
user
for
a description
of
an
In the same way, a full complement of filled slots can be the s y s t e m ' s c u e
t o p e r f o r m t h e s e r v i c e and attempt to terminate the conversation.
A s w e will s e e in S e c t i o n
6.3, a slot (or the goal of filling it) also provides a good representation for what the a t t e n t i o n
o f t h e s y s t e m and user is d i r e c t e d to at any given time.
service
in
the
form
of
a frame
provides
In short, k n o w l e d g e about a s i m p l e
a useful way
of
organizing
the
acquisition
of
i n f o r m a t i o n n e c e s s a r y to p e r f o r m the service.
The
slots
in
a frame
are
themselves
declarative
p o i n t e r s to information other than their actual fillers.
other
frames
which describe
the entities that
data structures,
and
so
can
contain
In particular, slots commonly r e f e r
may fill them; thus, the
person
slot
to
of
a
d i r e c t o r y a s s i s t a n c e frame might refer to a frame describing a p e r s o n w i t h slots f o r s u r n a m e ,
first
name, title, etc.; in turn
individual
letters.
a surname
might be d e s c r i b e d b y
a frame
with
slots
for
T h e s e r e f e r e n c e s to other frames serve as t y p e information to h e l p
s y s t e m d e c i d e in w h i c h slot a given description should be placed.
In addition, t h e
a
potential
t h e y p r o v i d e to r e p r e s e n t the grain of descriptions to essentially a r b i t r a r y levels of f i n e n e s s
(Jim Smith; Jim; a p e r s o n w h o s e first name begins with \T) is important for the f o c u s
and
i d e n t i f i c a t i o n f r o m d e s c r i p t i o n components, as w e shall see.
A n o t h e r t y p e of r e f e r e n c e found in slots is to procedures to be invoked w h e n a f i l l e r
is
f o u n d f o r a slot or w h e n an attempt is made to find a filler. This technique, c a l l e d p r o c e d u r a l
a t t a c h m e n t , w a s heavily exploited by the GUS system to allow transmission of f i l l e r s f r o m o n e
slot
to
another
and to
provide
special strategies to fill s e l e c t e d slots.
As
an
example,
consider
an e l e c t r o n i c mail system which, like ARPAnet mail, provides a s e p a r a t e F R O M
and
SENDER
f i e l d ; if
the
no F R O M is specified, then it is the same as the p e r s o n c o m p o s i n g
27
m e s s a g e ; if a d i f f e r e n t
procedure
to
transmit
F R O M is specified, then SENDER is the same as the c o m p o s e r .
the default contents of FROM to SENDER if a d i f f e r e n t
FROM
A
were
s p e c i f i e d w o u l d c l e a r l y be v e r y useful in this case.
A t h i r d t y p e of information that could be attached to slots in t e m p o r a r y information u s e d in
e s t a b l i s h i n g the c o n t e n t s of the slot.
For instance, if a system had only partially
understood
a u s e r ' s d e s c r i p t i o n of a slot, it could store in the slot what it had u n d e r s t o o d , plus p e r h a p s
s o m e h y p o t h e s e s about what it had not understood, the better to understand the u s e r ' s r e p l y
t o its r e q u e s t for clarification.
In addition, it is sometimes necessary to maintain t e m p o r a r y
i n f o r m a t i o n about alternative slot fillers, for insstance, w h e n the user's d e s c r i p t i o n of a slot is
a m b i g u o u s ( s e e S e c t i o n 7.2).
In s u m m a r y , frames p r o v i d e a natural representation for information about simple
domains: both
a priori
knowledge
about the domain itself
service
and information important
s y s t e m in i n t e r a c t i n g w i t h the user to formulate a request for service.
to
a
More specifically, the
a d v a n t a g e s of a f r a m e - b a s e d representation for a simple service domain include:
-
-
an e a s i l y manipulable representation for the overall task of interacting w i t h a
u s e r to formulate a request for a simple service; the structuring of a frame into
s l o t s makes it e a s y to tell:
-
w h a t p a r t s of the request have been formulated,
-
w h a t vital parts are missing,
-
and to what part the conversation is currently directed.
convenient
s t o r a g e associated with each frame slot for information r e l e v a n t
to
that slot including:
-
i n f o r m a t i o n about the t y p e and structure of the entities s u p p o s e d t o fill
the slot,
-
p r o c e d u r e s to transmit fillers appropriately from one slot to another
p r o v i d e special strategies to fill selected slots,
-
default fillers,
-
t e m p o r a r y information about partially described and alternative fillers.
and
In a d d i t i o n , f r a m e s have already b e e n used successfully in systems that p r o v i d e
services.
simple
28
5. Explanation Facility
A g r a c e f u l l y interacting system must be able to deal reasonably with any q u e s t i o n s that its
u s e r s e e s fit to p o s e .
to
us
the
most
In this section, we consider a limited set of question t y p e s that a p p e a r
important
for gracefully interacting systems, especially those o p e r a t i n g
s i m p l e s e r v i c e domains.* The set includes questions about the system's abilities, its
and
the
motives
questions
for
them, other
b a s e d o n these t y p e s .
events within the system's e x p e r i e n c e , and
We call the. ability to answer
questions of
in
actions
hypothetical
these
types
a p p r o p r i a t e l y an explanation facility.
T h e set of t y p e s does not cover all reasonable questions, e v e n for simple s e r v i c e d o m a i n s ,
so
that
a s y s t e m dealing only with these types could welf find itself f a c e d w i t h
questions
other
that it cannot answer, or even comprehend.
cases
explanation
of
complete
facility,
by
incomprehension, we
which
a gracefully
T o deal w i t h such c a s e s , and
also describe
interacting
legitimate
a technique, r e l a t e d
system
can
extricate
to
itself
s i t u a t i o n s in w h i c h its inability to comprehend has led to a totally c o n f u s e d dialogue.
5 . 1 . Explanation Types
T h e e x p l a n a t i o n facility w e p r o p o s e deals with four different question t y p e s :
-
Q u e s t i o n s about ability:
C a n y o u give me a number in Tokyo?
C a n I make a reservation?
H o w c a n I make a reservation?
C a n y o u give me a reservation for next Tuesday?
What places can y o u give listings for?
What c a n y o u do?
D e s p i t e the fact that many questions ostensibly about ability are in r e a l i t y
r e q u e s t s to p e r f o r m the embedded actions (with the service p r o v i d e r as the
agent i n s t e a d of " T in appropriate cases), some ability questions must b e
a n s w e r e d literally.
This requires a gracefully interacting system to h a v e an
e x p l i c i t model of its o w n abilities and (some of) its inabilities.
-
Q u e s t i o n s about e v e n t s :
What d i d y o u just say?
W h y d o y o u need to know my name?
Have y o u made a reservation for Mr. Smith?
W h y can't y o u make the reservation for eight?
Will y o u hold the reservation against late arrival?
W h y not?
This impression is based on an analysis of some human interactions in such domains.
with
the
from
29
D e a l i n g w i t h such questions clearly requires
a memory
for
what
has
already
o c c u r r e d in a c o n v e r s a t i o n , together with knowledge about the goal s t r u c t u r e of
t h e c o n v e r s a t i o n as d e s c r i b e d in Section 6.2.
—
H y p o t h e t i c a l questions (embedding either of the two preceding t y p e s ) :
C a n y o u g i v e me the number if I give y o u the address?
C o u l d y o u give me a reservation tonight if the party size w a s three?
Will y o u hold the reservation if I put d o w n a deposit?
A n s w e r i n g s u c h questions requires an ability to construct r e p r e s e n t a t i o n s of t h e
h y p o t h e t i c a l conditions without getting them confused w i t h the c u r r e n t s i t u a t i o n
o r t h e u s e r ' s p r e v i o u s l y stated preferences.
-
Factual questions:
W h e r e are y o u located?
Is t h e r e a c h a r g e for this service?
What are y o u r opening hours?
T h e s e q u e s t i o n s typically involve non-systematic domain-dependent
information,
a n d must b e a n s w e r e d o n an individual basis.
T h e r e p e r t o i r e of q u e s t i o n t y p e s that we propose to deal w i t h in our e x p l a n a t i o n f a c i l i t y is
v e r y f a r f r o m a full s p e c t r u m of all the types of questions that can occur in o r d i n a r y h u m a n
d i a l o g u e o r d i s c o u r s e ; this is clear from the range of question t y p e s c o n s i d e r e d in the
of C h a r n i a k [5], L e h n e r t [24], Scragg [35], and others.
Nevertheless, w e b e l i e v e that s u c h an
e x p l a n a t i o n facility, t o g e t h e r w i t h the method for dealing with c o n f u s e d dialogues
below
is
sufficient
for
a gracefully
interacting
work
system
in a simple
service
discussed
domain.
In
p a r t i c u l a r , t h e e x p l a n a t i o n facility p r o p o s e d here is of much greater s c o p e than that p r o v i d e d
i n m o r e c o n v e n t i o n a l q u e s t i o n - a n s w e r i n g systems such as LADDER [33], P L A N E S [ 4 0 ] , L U N A R
[46], etc.
w h i c h have
t e n d e d to concentrate on answering
factual questions; the
factual
q u e s t i o n s that s u c h systems answer can, however, be much more complicated and a r e d e a l t
w i t h in a much more systematic w a y than we are proposing here.
In t h e r e m a i n d e r of this section, we will consider the various question t y p e s in more d e t a i l .
T h e next t w o s u b s e c t i o n s will deal with ability questions, first in general human c o n v e r s a t i o n ,
a n d t h e n in r e s t r i c t e d domains.
questions respectively.
The next two subsections deal with event and h y p o t h e t i c a l
Finally, w e discuss a technique based o n the explanation f a c i l i t y
by
w h i c h a g r a c e f u l l y interacting system can extricate itself from situations in w h i c h its i n a b i l i t y
t o c o m p r e h e n d has c r e a t e d confusion.
5.2. Questions About Ability - Indirect Speech Acts
In g e n e r a l human dialogue, second person questions about abilities can be i n t e r p r e t e d
t w o q u i t e distinct w a y s .
in
T h e y can either be taken literally ("Can y o u swim?-; " C a n y o u lift
30
this
b a r - b e l l ? " ) , or
be
interpreted
as
requests
for
the
listener
to
perform
the
e m b e d d e d in the q u e s t i o n ( Can y o u open the window?-; "Can y o u tell me y o u r
M
action
address?").
T h e d i s t i n c t i o n b e t w e e n the two modes of interpretation does not d e p e n d o n the
question
i t s e l f , but r a t h e r o n the context (both linguistic and non-linguistic) in w h i c h it is s p o k e n .
e x a m p l e s g i v e n a b o v e to illustrate the two modes of interpretation could all b e
The
interpreted
t h e o t h e r w a y in a suitably c h o s e n context; the intended interpretations are simply t h e m o r e
common ones.
T h e d i f f e r e n c e b e t w e e n the two modes of interpretation is accounted for b y the linguistic
t h e o r y of s p e e c h acts [1], [36].
In brief, the theory says that the listener will i n t e r p r e t
an
a b i l i t y q u e s t i o n as a request to perform the embedded action if he believes that the s p e a k e r
already
open
k n o w s w h e t h e r or not he is able to perform the embedded action; thus, " C a n
the
w i n d o w ? " will generally
you
be interpreted as a request, because, unless t h e s e
are
u n u s u a l c i r c u m s t a n c e s , the listener will assume that the speaker believes that he is a b l e
to
open
an
t h e w i n d o w . Using an ability question to express a request in this w a y
is c a l l e d
i n d i r e c t s p e e c h act.*
The
with
m o d e of
the
i n t e r p r e t a t i o n of a second person question about ability is also
specificness
of
the question.
A non-specific
correlated
question such as ("Can
you
chop
w o o d ? " ) is much more likely to be interpreted as a literal question about abilities, t h a n a m o r e
s p e c i f i c q u e s t i o n ("Can y o u chop this wood?").
previously
This distinction is somewhat r e l a t e d t o
the
m e n t i o n e d rule for choosing b e t w e e n the two modes of i n t e r p r e t a t i o n , in that if
t h e s p e a k e r w e r e u n s u r e about a general ability of the listener, t h e r e w o u l d b e n o p o i n t
in
a s k i n g a b o u t a more specific ability.
One further
point to note is that e v e n though the listener i n t e r p r e t s an ability
l i t e r a l l y , t h e a l t e r n a t i v e mode of interpretation may also be relevant.
question
In particular, if t h e r e is
a n y p o s s i b i l i t y that the speaker may wish the listener to perform a p o s s i b l y more
specific
i n s t a n c e of the action embedded in the question, then the listener is likely to s u s p e c t that t h e
s p e a k e r actually d o e s w i s h him to perform such an action.
A:
C a n y o u c h o p wood?
B:
Y e s , w h a t do y o u want me to chop?
In t h i s e x a m p l e , it is a moot point whether B interpreted A's question literally or not.
most i m p o r t a n t point is that B realized that A might want B to c h o p some p a r t i c u l a r
The
wood,
e v e n t h o u g h A ' s q u e s t i o n was more general.
Of course, this is not the only kind of indirect speech act. Requests for action, for instance, can also be expressed
as questions about preferences ("Do you want to open the window?"), statements about preference ("I would like you
to open the window."), and even more obscurely ("It's cold in here.")
31
5.3. Questions A b o u t Ability in a Restricted Domain
In t h i s s u b s e c t i o n , w e consider what implications the human methods of dealing w i t h a b i l i t y
q u e s t i o n s h a v e f o r a g r a c e f u l l y interacting system.
It is s a f e to assume that for the forseeable future any gracefully interacting s y s t e m
operate
in
a highly
restricted
r e s t r i c t e d set of s e r v i c e s .
of
direct
questions
domain, and
be
able
to
provide
its
users
with
a
will
highly
Making this assumption allows us greatly to simplify the t r e a t m e n t
b y the user
about the systems ability.
In brief, if the user
asks
the
s y s t e m a b o u t its ability to do something that it can do, then it is reasonable to assume that
t h e u s e r is asking for the service to be performed; on the other hand, if the user asks a b o u t
the
system's
ability
to do something that it cannot do, a negative answer
is
appropriate.
T h u s , f o r a d i r e c t o r y assistance service:
U: C a n y o u tell me the number for Jim Smith?
S: Y e s , it's 5 6 2 9 .
U: C a n y o u connect me to Jim Smith?
S:
No, I'm s o r r y ! I Can't.
A n i n t e r e s t i n g c o n s e q u e n c e of this strategy is that a system requires no explicit model o f
its a b i l i t i e s to a n s w e r
ability questions which have positive answers.
Since s u c h q u e s t i o n s
a r e t r e a t e d in the same w a y * as requests to perform the embedded action, the s y s t e m n e e d
o n l y r e c o g n i z e the e m b e d d e d action and perform it.
To do this, the s y s t e m o n l y n e e d s t o b e
a b l e t o r e c o g n i z e exactly those actions that it can actually perform,.
C l e a r l y , this c o n s t i t u t e s
a n a b i l i t y m o d e l , but of an implicit kind.
The
chief
negative
with
a n s w e r s in the same w a y , and furthermore, relies on n o n - r e c o g n i t i o n of t h e a c t i o n
embedded
actions
d r a w b a c k to such an implicit model is that it treats all ability q u e s t i o n s
in the q u e s t i o n .
that
It is, of course, unrealistic
to imagine that all the
embedded
the s y s t e m cannot do can be recognized, bui recognition of some of the
more
l i k e l y o n e s w o u l d allow the system to help users with reasonable misconceptions a b o u t
the
s y s t e m ' s a b i l i t y , r a t h e r than just giving them a flat negative.
U: C a n y o u connect me to Jim Smith?
S:
N o ! Y o u ' l l have to dial yourself, the number is 5 6 2 9 .
A n o t h e r s i t u a t i o n in w h i c h an explicit model of ability and/or inability is useful o c c u r s w h e n a
service
has p a r a m e t e r s , and the system can only perform the service for c e r t a i n v a l u e s of
The treatment cannot be completely identical since "Yes, it's 5629" is not an appropriate response to "What is the
number for Fred Smith?", but apart from the format of the response there is no apparent need to discriminate.
32
the parameters.
U: C a n y o u g i v e n me a number in Rome?
S: No, I have only local numbers.
U: C a n y o u make me a reservation for next month?
S: No, w e o n l y accept reservations a week in advance.
T h e s e c o n d e x a m p l e particularly shows the importance of telling the user the limitations
the
s y s t e m p l a c e s o n the parameter; if it just replies in the negative, the user has n o w a y
of
k n o w i n g w h a t p r e v e n t s his request from being fulfilled, and consequently no w a y of d e c i d i n g
whether
he c a n modify his request so that the system can help him.
Note also that
this
s t r a t e g y a p p l i e s not only to requests phrased as ability questions, but also to r e q u e s t s in a n y
o t h e r format.
U:
I w o u l d like a r e s e r v a t i o n for next month.
S:
I'm s o r r y , w e only accept reservations a week in advance.
In d i s c u s s i n g questions about ability in human dialogues, w e o b s e r v e d that the more v a g u e
a n a b i l i t y q u e s t i o n , the less likely it was to be interpreted as a request for action.
t h a t a g r a c e f u l l y interacting system need not make this distinction.
positive
answers
c a n be
treated as requests
for
It s e e m s
All ability q u e s t i o n s
action, while all w i t h n e g a t i v e
with
answers
s h o u l d b e a n s w e r e d negatively, giving whatever extra information is a p p r o p r i a t e as d i s c u s s e d
above.
U: C a n y o u give me a number?
S: Y e s , w h o s e number would y o u like?
U: C a n y o u make me a reservation?
S: Y e s , w h a t time w o u l d y o u like?
A s t h e s e e x a m p l e s suggest, the requests implied by vaguer ability questions are t y p i c a l l y t o o
v a g u e to b e f u l f i l l e d d i r e c t l y .
In other words, the vagueness takes the form of the
absence
o f (or v a g u e n e s s in) c e r t a i n parameters of the requested task, without w h i c h the task is not
well-defined.
In the examples above, the user does not specify the name of the p e r s o n
w h o m t h e n u m b e r is r e q u i r e d or the time or other details of the r e s e r v a t i o n .
for
Appropriate
a c t i o n f o r the s y s t e m is not to reject the request because it is incompletely s p e c i f i e d , b u t
rather
to
ask
for
s p e c i f y i n g information, as in the examples above.
Of c o u r s e , t h e
extra
i n f o r m a t i o n the user goes on to supply may specify a request that cannot b e f u l f i l l e d , b u t
s u c h p r o b l e m s must be dealt with as they arise.
as
a question
formulated
as
about
ability
other
question
e s s e n t i a l l y t h e same w a y .
Again, note that formulation of the r e q u e s t
has little to do with the system's
types, statements, or
imperatives
response; vague
should
be
dealt
requests
with
in
33
There
action.
are e x c e p t i o n s
First,
generally
to our proposal for handling all ability questions
"wh"-questions
about
ability
("Where can y o u
give
as r e q u e s t s
listings
for? )
b e a n s w e r e d directly; although if the answer is a w k w a r d b e c a u s e of
its s i z e
s o m e o t h e r r e a s o n , it may still be appropriate to treat the question as a r e q u e s t f o r
("Where
d o y o u want a listing for?").
embedded
"What
or
action
Secondly, the extremeness of v a g u e n e s s in w h i c h t h e
a c t i o n is " d o " ("Can y o u do something for me?", or more naturally in " w h " - f o r m ,
c a n y o u do?") should cause ability questions to be treated literally, and make
s y s t e m g i v e an accounting of its abilities.
have
for
should
M
an explicit
model of
the
Clearly, both these exceptions r e q u i r e a s y s t e m t o
its o w n abilities.
They c o r r e s p o n d closely
to calls o n
a
"help
f a c i l i t y " of the t y p e commonly found in current (non-graceful) interactive systems.
T o e n d this s e c t i o n , w e might note two points:
first, virtually all that has b e e n s a i d a b o v e
a b o u t s e c o n d - p e r s o n ability questions applies equally to many f i r s t - p e r s o n ability q u e s t i o n s
("What c a n I do?", "Can I make a reservation for 7 PM?") in which the user p l a c e s himself
the active role.
in
In fact, the only cases in which the transformation is not applicable a r e t h o s e
in w h i c h t h e u s e r w a s a recipient in the second-person version ("Can y o u tell me the n u m b e r
f o r J i m Smith?").
In g e n e r a l , then, f i r s t - p e r s o n ability questions should be t r e a t e d e x a c t l y t h e
s a m e as t h e c o r r e s p o n d i n g s e c o n d - p e r s o n questions.
Secondly, " h o w " questions ("How c a n I
m a k e a r e s e r v a t i o n ? " , or a v e n "How do I make a reservation?") can usually be t r e a t e d in t h e
s a m e w a y as the c o r r e s p o n d i n g ability question without the how, i.e. as r e q u e s t s to p e r f o r m
the
action
specified.
" H o w " questions for
which this transformation
does
not
produce
a
r e q u e s t f u l f i l l a b l e b y the system ("How can I get there?", "How do y o u get any c u s t o m e r s at
t h o s e p r i c e s ? " ) are either informational questions which should be r e c o g n i z e d s e p a r a t e l y ( s e e
S e c t i o n 5.1), or are questions that a gracefully interacting simple service s y s t e m c o u l d not b e
e x p e c t e d to a n s w e r satisfactorily.
5.4. Event Questions
In this s e c t i o n , w e t u r n from ability questions to questions about events, c o v e r i n g e v e n t s
b o t h in t h e past and in the future, and the motives behind them.
q u e s t i o n s , the
As in the c a s e of
r e s t r i c t e d domains of forseeable gracefully interacting s y s t e m s make
q u e s t i o n s much more tractable.
The only event questions to which a g r a c e f u l l y
ability
event
interacting
s y s t e m c a n b e e x p e c t e d to give an informed reply are those concerning its o w n a c t i o n s a n d
t h e i r r e s u l t s a n d motives, along with those actions of its users which it can o b s e r v e d i r e c t l y .
Q u e s t i o n s a b o u t e v e n t s outside its direct experience will be incomprehensible.
E x a m p l e s of q u e s t i o n s that should be dealt with include:
W h a t d i d y o u just say?
Did y o u just ask for my name?
34
W h y d o y o u want to know my name?
W h y w o n ' t y o u make the reservation for 8pm?
a n d u s i n g e x a m p l e s from an electronic mail domain:
Has the message I cent y e s t e r d a y been delivered successfully?
Will this message be d e l i v e r e d immediately?
T h e f i r s t t w o examples are explicit indications of incomprehension as d i s c u s s e d in S e c t i o n 2.3.
Such questions
about what has just been said s e r v e a robust communication f u n c t i o n
should therefore
b e t r e a t e d as special cases.
and
Answering other event q u e s t i o n s r e q u i r e s
a
h i s t o r y of all the e v e n t s that have taken place in the current interaction, plus in the c a s e of
t h e s y s t e m ' s actions, the reasons for them.
first
The SHRDLU system of W i n o g r a d [ 4 4 ] w a s
to maintain s u c h a history of events and their reasons.
shows,
event
questions
can
refer
to
events
outside
the
As the penultimate
current
the
example
interactive
session.
P r e s u m a b l y , it is impractical to maintain a detailed history of all p r e v i o u s i n t e r a c t i o n s ,
and
i n s t e a d s u f f i c i e n t to remember only "major" events, such as the d e l i v e r y of a message in t h e
c a s e of an e l e c t r o n i c mail system.
Questions to the system about its f u t u r e actions, as in t h e
last e x a m p l e , r e q u i r e the system to "imagine" that the future has a r r i v e d under
conditions
and
to
observe
the
results; such
questions
are
thus
closely
prescribed
related
to
the
B e f o r e l e a v i n g this section, w e should note that, like ability questions, e v e n t q u e s t i o n s
can
h y p o t h e t i c a l q u e s t i o n s discussed in the next section.
a l s o b e i n d i r e c t s p e e c h acts.
not
an a c c e p t a b l e
asking
for
For most y e s - n o questions, a bare negative a n s w e r is g e n e r a l l y
r e s p o n s e , since such questions normally involve
an e x p l a n a t i o n
in the
negative
case.
Thus, for
"Was
an indirect
the
speech
message
delivered
s u c c e s s f u l l y ? " , it is unacceptable merely to reply "No!"; the reason for n o n - d e l i v e r y
also
be
given.
i n f o r m a t i o n , but
Again,
"Will
the
message
be
delivered
immediately?"
not
w o u l d normally be interpreted as an indication that the user
m e s s a g e to b e d e l i v e r e d immediately.
act
only
should
seeks
wishes
the
As a third and final example, the user of a r e s t a u r a n t
r e s e r v a t i o n s s e r v i c e s a y s in the middle of his conversation "Did I say there w o u l d b e e i g h t in
t h e p a r t y ? " , he is p r o b a b l y not e v e n interested in the direct answer to the q u e s t i o n , but o n l y
i n m a k i n g s u r e that the system knows there will be eight in the party.
The recognition
of
s u c h s p e e c h acts is a difficult problem which has been addressed in more g e n e r a l t e r m s
in
t h e w o r k of C o h e n [7], L e v i n and M o o r e [25], Lehnert [24], and others.
Adapting such w o r k
t o t h e r e q u i r e m e n t s of graceful interaction, and using all the constraints that simple s e r v i c e
d o m a i n s g i v e is an important but unsolved problem.
5.5. Hypothetical Questions
B e s i d e s b e i n g able to answer questions in the context of the current situation, a g r a c e f u l l y
i n t e r a c t i n g s y s t e m must also be able to deal with questions based on a h y p o t h e t i c a l c o n t e x t
35
i n v e n t e d b y the user.
easier
A s usual, dealing with such questions in limited domains is v e r y
t h a n f o r g e n e r a l human conversation, simply because the number of v a r i a b l e s
much
about
w h i c h h y p o t h e s e s can be made is so v e r y restricted.
In t h e c a s e of simple s e r v i c e systems, the most important class of h y p o t h e s e s are
those
c o n c e r n i n g the value of one of the slots of the service or one of the subslots of a slot.
A
u s e r may ask an ability question based on the hypothesis that such a slot has a c e r t a i n v a l u e .
o r e v e n that the (unspecified) value of the slot is known to the system. Examples i n c l u d e :
C a n y o u g i v e me a r e s e r v a t i o n if the party-size is three?
C a n y o u g i v e me the number, if I give y o u the address?
U: P d like a r e s e r v a t i o n for this evening.
S: I'm a f r a i d w e ' r e fully booked.
U: C o u l d y o u give me a reservation for tomorrow?
To
answer
such
questions,
a
gracefully
interacting
system
must
first
construct
r e p r e s e n t a t i o n of the h y p o t h e s i z e d situation, and then evaluate and a n s w e r the q u e s t i o n .
a
In
d o i n g t h i s , the s y s t e m must be careful not to confuse the r e p r e s e n t a t i o n of the h y p o t h e t i c a l
s i t u a t i o n w i t h any other conflicting situations the user has previously s p e c i f i e d , as in t h e last
example.
The
user may w i s h to return to his previous specification, or e v e n to
introduce
o t h e r s , a n d s w i t c h b e t w e e n them several times (this might be quite common for, s a y ,
airline
r e s e r v a t i o n system).
alternative
switching
knowledge
situations
back
and
In other words, the system must have a w a y of
representing
independently, of sharing common information b e t w e e n them, a n d
forth
between
the
alternatives.
Such ideas
have
been
explored
r e p r e s e n t a t i o n systems such as CONNIVER [38] and NETL [10], and in w o r k
H e n d r i x [ 2 1 ] , H a y e s [16], and others.
an
of
in
by
The methods thus d e v e l o p e d are suitable f o r u s e w i t h
t h e f r a m e s t y p e of r e p r e s e n t a t i o n p r o p o s e d in Section 4.3.
Even
after
a representation
for
the hypothesis has been constructed, it is not
always
p o s s i b l e to treat a hypothetical question in the same w a y as its embedded q u e s t i o n w o u l d b e
treated
if
it
were
posed
in a context
corresponding
p a r t i c u l a r , the indirect s p e e c h act may be different.
eight
pm
tonight?",
spoken
in
a context
to the
situation
hypothesized.
In
Thus "Can y o u give me a r e s e r v a t i o n f o r
in which
the
party
size
has
been
previously
e s t a b l i s h e d w o u l d normally be interpreted as a directive to make a r e s e r v a t i o n at the s t a t e d
t i m e if that is p o s s i b l e .
the
party
size
On the other hand, "Can y o u give me a r e s e r v a t i o n for 8 p m t o n i g h t , if
is t h r e e ? " w o u l d normally be interpreted as an e x p l o r a t i o n of
possibilities,
r a t h e r t h a n a r e q u e s t for action, and might well have been p r e c e d e d by a d i s c u s s i o n in w h i c h
t h e p a r t y - s i z e w a s d i f f e r e n t ; it should therefore be answered literally.
H o w e v e r , o n c e it h a s
a n s w e r e d y e s or no, the system should not forget the hypothesis, since the user may t h e n
d e c i d e t o make a r e s e r v a t i o n based on that hypothesis.
36
F i n a l l y , n o t e that h y p o t h e s e s can be made about things other than slot values, as i n :
h a d c a l l e d t w o h o u r s ago w o u l d y o u have been able to give me a reservation?".
s u c h a q u e s t i o n w o u l d require an ability to change the context b y r e s t o r i n g the
d a t a - b a s e to its state t w o hours ago.
"If I
Answering
reservations
For a practical system, one w o u l d have to b a l a n c e t h e
u t i l i t y of a n s w e r i n g s u c h questions against the expense of facilities for changing the c o n t e x t
in t h e w a y r e q u i r e d .
5.6. A b i l i t y M o d e l s as a Safety Net
In
the
preceding
sections w e have discussed what is essentially
a passive
explanation
f a c i l i t y : o n e that issues explanations in response to a user's e x p r e s s d e s i r e s ( w h e t h e r
desires
a r e e x p r e s s e d literally, or in an indirect speech act).
those
In this s e c t i o n w e d i s c u s s a
m o r e a c t i v e t y p e of explanation facility: one which volunteers explanations of its o w n a b i l i t i e s
in c i r c u m s t a n c e s
in w h i c h the user might benefit from such explanations.
We only
propose
o n e s u c h c i r c u m s t a n c e h e r e - w h e n the system has repeatedly failed to u n d e r s t a n d t h e u s e r
a n d the c o n v e r s a t i o n is in danger of breaking d o w n completely.
system
there
might
be
others, such
as when
the
user
For a more
appeared
to
be
sophisticated
using
a
more
c o m p l i c a t e d s t r a t e g y to accomplish his goals than was really necessary.
We
claim
then
that,
when
a
gracefully
interacting
system
has
repeatedly
failed
to
u n d e r s t a n d its user, it should use its ability model to let the user know what it c a n d o f o r
him, p l u s p o s s i b l y what it needs to know to do it. There are two benefits: first, it a l l o w s t h e
user
t o s e e w h e t h e r the system can help him in what he wants to do; and s e c o n d l y , if t h e
s y s t e m c a n h e l p , the w a y in w h i c h it states its abilities suggests to the user a p h r a s i n g
for
h i s r e q u e s t s that the system w o u l d be able to understand.
U: What is the number for Jim Smith?
S: Jim Smith's number is 2597.
U: T h a n k y o u . Do y o u know how the weather is w h e r e y o u are?
S: A r e y o u asking me for another number?
U: N o ! I'm asking y o u about the weather.
S: F m s o r r y ! All I can do is give y o u telephone numbers. I am completely
i g n o r a n t of e v e r y t h i n g else.
In t h i s e x a m p l e , the system cannot understand the user's questions about the w e a t h e r .
The
f i r s t time he asks, it asks a question based on what it expected him to say (this s t r a t e g y w a s
d i s c u s s e d in s e c t i o n 2.3), but this does not produce anything more u n d e r s t a n d a b l e f r o m t h e
user.
T h e s y s t e m t h e n assumes that the conversation is breaking d o w n and uses its
m o d e l t o i n f o r m the user of what it can do for him (and intensionally what it cannot).
ability
At this
p o i n t , t h e u s e r is fully a w a r e of w h y the system is not cooperating more d i r e c t l y w i t h him b y
f u l f i l l i n g his r e q u e s t for information about the weather, and so is much less likely to b e c o m e
37
f r u s t r a t e d t h a n if he didn't understand the reason for the problem.
T h e f o l l o w i n g example s h o w s how the w a y the system phrases its statement of abilities c a n
a i d t h e u s e r in formulating his request.
We
U:
I w a n t to come and eat at your restaurant.
S:
I'm s o r r y ! All I c a n do is make reservations.
e v e r y t h i n g else.
are
unusual
assuming
that
the
w o r d i n g , and that
system cannot
I am completely ignorant about
understand the user's
request
because
this fragment was p r e c e d e d by other unsuccessful
c o m m u n i c a t i o n , so that the system has fallen back on its ability model.
of
the
attempts
at
Now, if the u s e r c a r e s
t o h e e d the s y s t e m ' s statement of its ability, he might try something like: " T h e n p l e a s e make
me a r e s e r v a t i o n . " , w h i c h the system should be able to understand.
implies
that
Of c o u r s e , this
feature
the s y s t e m can comprehend any appropriately transformed v e r s i o n of its
own
s t a t e m e n t of abilities (see Section 8).
F i n a l l y , n o t e that this use of an ability model, while vitally important for a c o m p u t e r s y s t e m
o f l i m i t e d u n d e r s t a n d i n g , has little basis in real human dialogues.
humans
understand
conversations
each
other
so
well
and
have
such
a
The r e a s o n for this is c l e a r :
breadth
of
knowledge
h a r d l y e v e r break d o w n into repeated incomprehensions; s o humans d o
h a v e a n y r e a l n e e d for a last resort strategy of the kind w e have discussed.
5.7. Summary of Explanation Facilities
W e c a n summarize this section as follows:
-
A g r a c e f u l l y interacting question should be able to answer s e v e r a l d i f f e r e n t
t y p e s of q u e s t i o n from its user, including questions about its abilities, about its
a c t i o n s and the motives for them, about other events in its past e x p e r i e n c e , a n d
h y p o t h e t i c a l q u e s t i o n s based on all these other types.
-
In human dialogue, s e c o n d - p e r s o n questions about ability are i n t e r p r e t e d in o n e
of t w o distinct modes:
either literally or as requests to p e r f o r m the a c t i o n
e m b e d d e d in them.
-
In g e n e r a l , a g r a c e f u l l y interacting system need not make this d i s t i n c t i o n ;
s e c o n d - p e r s o n ability questions with positive answers can be t r e a t e d as
r e q u e s t s for action, and those with negative answers should be a n s w e r e d
n e g a t i v e l y . This d o e s not require an explicit ability model.
-
H o w e v e r , an explicit model is useful for the negative cases w h e n it is o n l y
i n c o r r e c t p a r a m e t e r s that prevent the system from performing the e m b e d d e d
a c t i o n , and for giving useful information besides a flat negative in o t h e r
c o m m o n l y o c c u r r i n g negative cases.
-
O t h e r c a s e s in w h i c h ability models are important are " w h - " ability questions, a n d
that
not
38
v e r y v a g u e ability questions in which the embedded action is "do."
M o s t f i r s t - p e r s o n ability questions can be treated in exactly the same w a y
t h e c o r r e s p o n d i n g s e c o n d - p e r s o n questions.
as
T o a n s w e r e v e n t questions, including those about its o w n actions and the motives
f o r them, a s y s t e m needs to maintain a history of all events that have t a k e n
p l a c e in the c u r r e n t interaction, plus major events from previous i n t e r a c t i o n s ,
t o g e t h e r w i t h its r e p r e s e n t a t i o n of the goal structure of the c o n v e r s a t i o n at t h e
time of t h o s e e v e n t s .
E v e n t q u e s t i o n s , especially those of a y e s / n o type can also be indirect
speech
acts.
A n s w e r i n g h y p o t h e t i c a l questions requires an ability to construct and s w o p
b e t w e e n t e m p o r a r y contexts which represent the conditional p a r t s of t h e
h y p o t h e t i c a l questions; the range of aspects of the context w h i c h c a n b e
m a n i p u l a t e d in this w a y restrict the range of hypothetical questions w h i c h c a n b e
dealt with.
A g r a c e f u l l y interacting system can also use its model of its o w n abilities t o
e x t r i c a t e itself from totally confused dialogues b y telling the user w h a t it c a n
a n d cannot do for him.
6. Goals and Focus
In t h i s s e c t i o n w e deal w i t h the intimately related topics of the goals of the p a r t i c i p a n t s i n
a c o n v e r s a t i o n and what their attention is focused on at any given point in the c o n v e r s a t i o n .
Both
these
items
are
very
complicated
in general
t r e a t m e n t is far b e y o n d the scope of this paper.
human c o n v e r s a t i o n
and
a
thorough
Instead, w e will largely c o n f i n e o u r s e l v e s t o
d e a l i n g w i t h t h e s e phenomena in gracefully interacting simple service systems.
We will find
t h a t h i g h - l e v e l goals can be treated v e r y simply for such systems, since t h e r e are v e r y
few
g o a l s that the s y s t e m can recognize in the user (essentially only those to make u s e o r
find
o u t a b o u t the s e r v i c e s p r o v i d e d by the system, plus an undistinguished set of o t h e r s that t h e
s y s t e m c a n n o t h e l p with), and since the system has no goals except to help the u s e r s a t i s f y
his.
S u b g o a l s that arise during the satisfaction of the top-level goals must, h o w e v e r ,
treated
more
generally
in
order
for
a system
to
understand
what
is
happening
in
be
a
conversation.
T h e p a r t i c i p a n t s in a conversation share a common view of what the c o n v e r s a t i o n is a b o u t ,
w h i c h w e call the focus of the conversation.
This shared focus allows them to e c o n o m i z e in
what
and ellipsis.
they
say
through
anaphoric reference
It is thus v e r y
important
for
a
g r a c e f u l l y i n t e r a c t i n g system to be able to follow the shifting focus of a c o n v e r s a t i o n in o r d e r
t o i n t e r p r e t its users* ellipsis and anaphora. Goals and focus are clearly highly r e l a t e d in a n y
c a s e , but
for
simple s e r v i c e
systems w e find it expedient
to equate them.
While
other
39
d e f i n i t i o n s of f o c u s have aided in the resolution of anaphora, the v i e w w e p r o p o s e aids also
in t h e t r e a t m e n t of ellipsis. We will describe how this view of focus helps in the r e s o l u t i o n of
e l l i p s i s a n d a n a p h o r a , and discuss strategies for keeping track of it.
6 . 1 . Goals
H u m a n c o n v e r s a t i o n is v e r y goal-oriented.
In other words, virtually e v e r y t h i n g a p e r s o n
s a y s d u r i n g a c o n v e r s a t i o n is intended to achieve some definite aim. A n ability to d e t e r m i n e
t h e g o a l s of its user is thus of prime importance for a gracefully interacting s y s t e m .
In g e n e r a l
varied.
human c o n v e r s a t i o n , the types of goals that can be p u r s u e d are
They
extremely
range
from highly specific (trying to find someone's
telephone
many
and
number)
to
v a g u e (maintaining "interesting" small-talk), from quite s t r a i g h t f o r w a r d ( t r y i n g
to
f i n d o u t s o m e o n e ' s o r i g i n b y a direct question) to downright devious (trying to
manoeuvre
s o m e o n e into p r o n o u n c i n g a certain w o r d so that y o u will know his origin); f r o m c o m p l e t e l y
cooperative
(obtaining
a number
antagonistic
(a cross-examination
from a helpful
at
a trial).
directory
assistance
Fortunately, as w e
operator)
shall
see, a
to
totally
gracefully
i n t e r a c t i n g simple s e r v i c e system need concern itself with only a highly r e s t r i c t e d s u b s e t
of
t h e s e v a r i o u s t y p e s of goals.
Not
s u r p r i s i n g l y , the goals people have affect what they say and the w a y t h e y
say
it.
E v e n m o r e i m p o r t a n t l y from our point of view, the w a y a listener i n t e r p r e t s what a s p e a k e r
s a y s is d e p e n d e n t u p o n the listener's view of the speaker's goals.
T h e w a y in w h i c h
this
h a p p e n s is the s u b j e c t of the linguistic theory of speech acts by Austin [1] and S e a r l e [ 3 6 J ,
a n d has also b e e n studied in more computationally oriented w o r k by C o h e n [7] and M a n n ,
M o o r e , a n d L e v i n [27].
T h e p r o b l e m s i n v o l v e d in dealing computationally with the full range of goals and s p e e c h
a c t s a r e immense.
In b o t h pieces of work cited, it was found necessary to maintain a m o d e l
o f t h e b e l i e f s of b o t h participants in a conversation, including beliefs about the b e l i e f s o f t h e
other participant.
In addition, Levin and Moore [25] found it useful to r e c o g n i z e a n u m b e r of
d i s t i n c t p a t t e r n s of e x p r e s s e d goals and characteristic responses; examples i n c l u d e d
asking
f o r i n f o r m a t i o n , r e q u e s t i n g help with a problem, and requesting a service.
While
the
work
just
mentioned has explored the problem, many u n r e s o l v e d
difficulties
r e m a i n in the r e c o g n i t i o n and modelling of all the types and patterns of goals that c a n a r i s e in
general conversation.
So it is fortunate that graceful interaction, at least for simple
service
d o m a i n s , i n v o l v e s the recognition and modelling of only a highly r e s t r i c t e d set of goal t y p e s .
F o r t h e r e m a i n d e r of the section w e will concentrate our attention on this r e s t r i c t e d p r o b l e m .
40
A
gracefully
interacting
simple
service
system
can
make
the
following
two
sweeping
s i m p l i f i c a t i o n s in its modelling of goals:
-
it has no i n d e p e n d e n t goals of its own; its only goal is to help the user fulfill his
goals;
-
t h e u s e r ' s goals are either to avail himself of the system's highly limited
s e r v i c e s , or fall into an undistinguished class for which the system is unable to
h e l p the user.
N o t e that t h e s e simplifications apply only to primary or top-level goals; l o w e r - l e v e l g o a l s o r
s u b g o a l s w h i c h c a n arise during an attempt to satisfy a top-level goal must b e t r e a t e d m o r e
g e n e r a l l y as w e will see in the next section.
The
two
restrictions
require
little justification.
The first follows from the
g r a c e f u l l y i n t e r a c t i n g system w h o s e reason for existence is to s e r v e its user.
notion
of
a
Such a s y s t e m
w o u l d h a v e no i n t e r e s t s of its o w n to w o r r y about, as does e v e r y human p a r t i c i p a n t
in a
c o n v e r s a t i o n , and so it n e e d have no goals other than those it can r e c o g n i z e in its u s e r .
In
a t t e m p t i n g to s a t i s f y its u s e r s ' goals, it can, of course, generate subgoals w h i c h c a u s e it t o
t a k e t h e initiative in the interaction, but such initiatives are always s u b s e r v i e n t to t h e u s e r ' s
t o p - l e v e l goal.
T h e s e c o n d r e s t r i c t i o n is appropriate because it is futile for a gracefully i n t e r a c t i n g s y s t e m
t o r e c o g n i z e goals in the user that it cannot help to fulfil.
The only goals that a g r a c e f u l l y
i n t e r a c t i n g s y s t e m needs to recognize in its user are those within its domain of e x p e r t i s e ,
p l u s p e r h a p s a f e w other closely related areas that it explicitly knows it cannot fulfil, but f o r
w h i c h it c a n o f f e r some helpful advice (see Section 5.3).
graceful,
it
can
incomprehension.
treat
all other
goals
that
the
user
Without becoming s i g n i f i c a n t l y
might
express
with
less
undifferentiated
F o r instance, it would be of little use for a d i r e c t o r y assistance p r o g r a m t o
r e c o g n i z e that its user w a s t r y i n g to find out the state of the weather, or t r y i n g t o ask it o u t
t o d i n n e r , s i n c e it w o u l d be quite unable to fulfil either of those goals of the user.
N e i t h e r of t h e s e simplifications is generally true of gracefully interacting s y s t e m s
t h e s i m p l e s e r v i c e class.
outside
A graceful supervisory or instructional program might, f o r i n s t a n c e ,
h a v e t o c h o o s e b e t w e e n cooperating in fulfilling a user's e x p r e s s e d goal, w h i c h it k n o w s
b e a b a d g o a l to have, and c o r r e c t i n g the user.
to
Again, the range of goals that a u s e r might
r e a s o n a b l y e x p r e s s to a secretarial system coultf be quite large.
In s u m m a r y , the combination of recognizing a highly limited number of goals in t h e
user,
a n d making the u s e r ' s goals the system's own, allows the treatment of h i g h - l e v e l g o a l s
for
s i m p l e s e r v i c e domains to be extremely simple. The system need have no h i g h - l e v e l g o a l s of
41
its o w n ; it just c o o p e r a t e s with the user in fulfilling his goals.
assume
that
M o r e o v e r , the s y s t e m
can
the goals of the user are either to use (and/or p e r h a p s find out a b o u t ) t h e
s e r v i c e it o f f e r s , or are outside the competence of the system. Thus, g r a c e f u l i n t e r a c t i o n in a
s i m p l e s e r v i c e domain does not require a sophisticated model of its o w n and the u s e r ' s g o a l s
a n d m o t i v a t i o n s , s u c h as w o u l d be necessary for a more general interaction.
6.2. S u b g o a l s In Simple Service Domains
W h i l e t o p - l e v e l goals can be treated in a v e r y simplified w a y by a g r a c e f u l l y
simple
service
s y s t e m , the subgoals that can arise during fulfillment of the t o p - l e v e l
must b e t r e a t e d much more generally.
can
be
interacting
nested
arbitrarily
As w e will see, subgoals for simple s e r v i c e
goals
systems
d e e p l y , can originate from the system as w e l l as the u s e r ,
and
i n v o l v e t h e a c q u i s i t i o n or imparting of specific pieces of information or d e s c r i p t i o n s .
S u b g o a l s arise b e c a u s e t o p - l e v e l goals often cannot be accomplished in a single s t e p .
a
user
may
reservation
be
unable
(time,
party
or
unwilling
to specify
size, etc.) in
a single
all the components
utterance.
of, say, a
Instead, he
may
Thus
restaurant
spread
s p e c i f i c a t i o n s o v e r s e v e r a l different utterances; w e can say that in e a c h of those
the
utterances
h e is p u r s u i n g the subgoals of imparting the specifications of the c o r r e s p o n d i n g c o m p o n e n t s .
U n l i k e t o p - l e v e l goals, subgoals can originate from the system as well as the user.
may
fail
to v o l u n t e e r
specifications
of certain parameters of
the s e r v i c e , or
may
A
user
specify
p a r a m e t e r s in an ambiguous or unsatisfiable w a y (see Section 7).
In such c a s e s , the s y s t e m
may
of
take
the
initiative
by
formulating
a p p r o p r i a t e i n f o r m a t i o n from the user.
and pursuing
a subgoal
trying
to
acquire
the
If the user of a restaurant r e s e r v a t i o n s s y s t e m , f o r
i n s t a n c e , d o e s not v o l u n t e e r the size of his party, the system will have to ask him e x p l i c i t l y .
S u c h s y s t e m g e n e r a t e d subgoals are, nevertheless, derived from and s u b o r d i n a t e t o t h e g o a l s
o f t h e u s e r , and s h o u l d not be pursued blindly if the goals of the user change.
T h e r e c a n b e s e v e r a l levels of subgoals; that is, subgoals can themselves h a v e
subgoals.
S o f a r , all o u r examples have b e e n in terms of the primary parameters of the s e r v i c e o f f e r e d
b y a simple service system:
subgoals on the part of the user to s p e c i f y a p a r a m e t e r t o t h e
s y s t e m , a n d s u b g o a l s o n the part of the system to obtain the specification of a p a r a m e t e r
f r o m the user.
In terms of the frames representation d e s c r i b e d in Section 4.3, the s u b g o a l s
h a v e b e e n t o fill the slots of the frame representing the service.
may themselves
not a l w a y s be achieved in one step.
H o w e v e r , these
If the system, for w h a t e v e r
subgoals
reason,
d o e s not u n d e r s t a n d all of a description of a parameter (slot) b y the user, it c a n set u p t h e
s u b g o a l s of o b t a i n i n g the missing part of the description from the user.
U: I w o u l d like the number for Jim <GARBLE>.
S: Jim w h o ?
42
In t h i s e x a m p l e (see Section 2.3), the user is pursuing the goal of t r y i n g to find a t e l e p h o n e
n u m b e r , i n c l u d i n g the subgoal of trying to identify to the system the p e r s o n f o r w h o m
wants
the
number.
This
subgoal
succeeds
only
in
part,
because
the
system
he
cannot
u n d e r s t a n d the surname, and so the system sets up a sub-subgoal by explicitly asking t h e
user
to
redescribe
the
surname.
Note
that
s u b o r d i n a t e to the user's original subgoal.
the
subgoal
generated
by
the
system
Other common examples of subgoals of
is
subgoals
o c c u r w h e n a u s e r ' s d e s c r i p t i o n of a slot is ambiguous or unsatisfiable (see S e c t i o n 7) a n d
t h e s y s t e m attempts to r e s o l v e the difficulty.
If a g o a l c a n have s e v e r a l subgoals which can in turn have subgoals of their o w n a n d s o
o n , t r e e s of s u b g o a l s of the t y p e common in planning systems (see Hayes [ 1 7 ] and S a c e r d o t i
[34], for example) can be produced.
to
plan
an e n t i r e
However, since a simple service s y s t e m d o e s not
c o n v e r s a t i o n out in advance, there is no need, as is usual in
need
planning
s y s t e m s , to g e n e r a t e complete trees of subgoals in advance (though see Section 6.4); i n d e e d ,
a n i n t e r a c t i o n in w h i c h the system insisted that all parameters w e r e s p e c i f i e d in a f i x e d o r d e r
(and
the
important
generated
generation
of
such a tree would imply this) would be quite
is the d e p e n d e n c y b e t w e e n subgoals and their parent goals.
ungraceful.
More
Since s u b g o a l s
in an attempt to fulfil higher-level goals, they must be a b a n d o n e d r a t h e r
are
than
b l i n d l y p u r s u e d if the parent goals are abandoned or changed.
. U: I w o u l d like the number for Mr. Smith.
S: Do y o u mean Jim Smith or Joe Smith?
U: I'm s o r r y , I mean F r e d Jones.
Irt t h i s e x a m p l e , the s y s t e m generates a subgoal of deciding b e t w e e n t w o a l t e r n a t i v e f i l l e r s
c o r r e s p o n d i n g to the ambiguous slot specification b y the user.
cooperate
However, the u s e r d o e s
not
w i t h the system's subgoal, but instead changes his o w n subgoal f r o m w h i c h
the
s y s t e m ' s s u b g o a l is d e r i v e d . This means that the system should now abandon its s u b g o a l a n d
a t t e m p t t o fulfil the user's revised subgoal.
Note that the commitment of the s y s t e m a l w a y s
t o c o o p e r a t e w i t h o u t any c o r r e s p o n d i n g commitment o n the part of the user means that s u c h
c h a n g e s in goals will a l w a y s originate with the user.
The
dependency
of subgoals o n the goals that generate them is not the o n l y
d e p e n d e n c y that n e e d c o n c e r n simple service systems.
form
Since parameters to s e r v i c e s c a n b e
i n t e r d e p e n d e n t , s u b g o a l s involved in establishing one slot can be d e p e n d e n t o n a n o t h e r
being
filled
or
remaining
unchanged.
For
example, the time, date, and p a r t y - s i z e
r e s t a u r a n t r e s e r v a t i o n are mutually constraining, so that changing one may r e q u i r e
another.
of
slot
of
a
changing
W o r k o n r e p r e s e n t a t i o n for these more general t y p e s of d e p e n d e n c i e s has
been
d o n e b y H a y e s [ 1 7 ] and Doyle [9].
T o e n d this s u b s e c t i o n , w e might comment on the general form of subgoals that a r i s e f o r
43
simple s e r v i c e systems.
c o m m o n l y , the
user
T h e most important are subgoals to impart or acquire
will have subgoals of imparting the specifications
for
information;
the
parameters
( s l o t s ) of the s e r v i c e , and the system will, when appropriate, generate subgoals f o r
s u c h s p e c i f i c a t i o n s from the user.
acquiring
L o w e r - l e v e l subgoals of this t y p e c o n c e r n p a r t s of t h e s e
s p e c i f i c a t i o n s , i.e. slots of frames which fill the slots in the main frame for the s e r v i c e (e.g.
t h e s u r n a m e f o r a name).
Subgoals
can
also
s p e c i f y i n g a slot.
concern
the
acquisition of
information
relevant
and
prerequisite
to
Thus, prior to specifying the time of a restaurant r e s e r v a t i o n , a u s e r might
a s k w h a t time the restaurant closed.
If the user was asking the question b e c a u s e he w a n t e d
a r e s e r v a t i o n as late as possible, this question would represent a subgoal of the s u b g o a l of
s p e c i f y i n g the time.
different
Since an informational question could also signal a s w i t c h to a c o m p l e t e l y
s u b g o a l , k n o w l e d g e of what information was relevant to w h i c h slot
specifications
w o u l d b e u s e f u l to the s y s t e m to enable it to keep track of which subgoals are c u r r e n t .
S: W h a t time w o u l d y o u like your reservation?
U: W h a t time do y o u close?
In
this
example
probably
it
would
be
desirable
for
the system to appreciate
that
the
user
was
c o o p e r a t i n g w i t h the subgoal it had generated, rather than i n t r o d u c i n g a d i s t i n c t
s u b g o a l of his o w n .
H o w e v e r , in general, arbitrary amounts of knowledge may b e r e q u i r e d t o
m a k e this d e t e r m i n a t i o n ; the user might, for instance, ask how far the restaurant w a s
t h e O p e r a House in t r y i n g to specify the time of his reservation.
from
T h e best s t r a t e g y may b e
t o r e c o g n i z e common examples of this type of subgoal as special cases, and a n s w e r all o t h e r
informational
questions
goal or subgoal.
w i t h the presumption that they represent subgoals of the
current
Note that from this perspective, general questions about the s y s t e m ' s a b i l i t y
c a n b e v i e w e d as subgoals of the user's top-level goal to make use of the system's s e r v i c e s .
O t h e r m e t h o d s of k e e p i n g track of goals and subgoals are discussed in Section 6.4.
In s u m m a r y , s u b g o a l s can arise for both the user and the system and t h e y c a n b e
s e v e r a l levels deep.
nested
T h e system should always cooperate with the subgoals e s t a b l i s h e d
by
t h e u s e r , but the user cannot be expected always to cooperate with subgoals e s t a b l i s h e d b y
t h e s y s t e m ; he may p u r s u e other subgoals independently, or change p r e v i o u s l y
g o a l s o r s u b g o a l s o n w h i c h the system's subgoal depended.
established
In either case, the s y s t e m s h o u l d
f o l l o w t h e u s e r ' s initiative rather than doggedly pursuing its o w n subgoals.
T h e most c o m m o n
f o r m of s u b g o a l c o n c e r n s the specification of a parameter of the service or of a s u b - s l o t of
s u c h a parameter.
specification.
A n o t h e r common type seeks information useful for e s t a b l i s h i n g s u c h
a
44
6.3. F o c u s
When
people
engage
s p e c i f i c t o p i c or f o c u s .
in conversation, their
attention is generally
directed
to
a
highly
This focus of attention is typically the same for all the p a r t i c i p a n t s ,
g i v i n g r i s e to the impression that they are "talking about the same thing".
A s h a r e d f o c u s of
a t t e n t i o n a l l o w s dialogue participants to economize substantially o n what t h e y s a y
through
u s e of e l l i p s i s and anaphora; they assume listeners can fill in the missing information b y u s e
o f t h e c o m m o n focus.
Since people make such economies of e x p r e s s i o n v e r y f r e q u e n t l y
and
n a t u r a l l y , a n y g r a c e f u l l y interacting system must be able to keep track of the s h i f t i n g f o c u s
of
its c o n v e r s a t i o n and use it to resolve the ellipsis and anaphora of its user (as w e l l
p e r h a p s g e n e r a t i n g its own).
are
the
topic
of
the
next
as
The problems involved in keeping track of c o n v e r s a t i o n a l f o c u s
section; in the
remainder
of
this section, w e
will
provide
an
e x p e d i e n t d e f i n i t i o n of focus for simple service domains, and show h o w s u c h a f o c u s c a n h e l p
r e s o l v e a n a p h o r a and ellipsis.
A d e f i n i t i o n of f o c u s that is both precise and general is hard to formulate.
Even though the
n o t i o n of f o c u s (or topic) has b e e n used in computational systems b y G r o s z [ 1 4 ] and S i d n e r
[ 3 7 ] as w e l l as in more traditional linguistic work, including some b y Grimes [ 1 3 ] and H o c k e t t
[ 2 2 ] , it h a s b e e n v i e w e d in different ways according to the r e s e a r c h goals i n v o l v e d .
for
instance,
modelled
dialogues
in
which
an
expert
guided
an
apprentice
Grosz,
through
m e c h a n i c a l a s s e m b l y task, and found it convenient to equate the focus of the d i a l o g u e
the
set
of
discussion.
objects
involved
in the
assembly
step currently
being
undertaken
or
a
with
under
Sidner, o n the other hand, dealt with monologues s p e c i f y i n g a r r a n g e m e n t s f o r
m e e t i n g , a n d s o took focus to be a complete single entity:
s o m e p a r a m e t e r of it, such as its time or place.
either a meeting as a w h o l e
a
or
In both cases, the entity o r e n t i t i e s in f o c u s
a r e u s e d t o h e l p r e s o l v e anaphoric references by providing a set of potential r e f e r e n t s
for
such references.
Nor
instead
will w e
we
will
attempt
again
to define focus in general, or e v e n for all of g r a c e f u l
confine
ourselves
to simple
service
systems.
For
interaction;
such
systems,
h o w e v e r , w e p r o p o s e an extension of the notion of focus by equating it w i t h t h e c u r r e n t l y
active subgoal.
This v i e w of focus subsumes the notion of a collection of entities t o w h i c h
a t t e n t i o n is c u r r e n t l y d i r e c t e d , since the current subgoal also defines a set of e n t i t i e s w h i c h
a r e c u r r e n t l y r e c e i v i n g attention - those involved in the subgoal.
This set can b e u s e d in t h e
s a m e w a y as just mentioned to provide potential referents for anaphora.
U s i n g t h e c u r r e n t subgoal as the focus also, however, helps in the r e s o l u t i o n of
ellipsis.
T h e e l l i p s e d information can o f t e n be filled in, via the assumption that the elliptical u t t e r a n c e
i s i n t e n d e d to fulfil the current
subgoal, thus providing a complete i n t e r p r e t a t i o n f o r
the
45
elliptical utterance.
For instance, an answer of "X" in response to a question of "Do y o u m e a n
X o r Y ? " c o u l d b e s e e n as fulfilling the subgoal set up by the question of c h o o s i n g b e t w e e n X
a n d Y.
I n t e r p r e t e d in the light of such a focused subgoal, the elliptical u t t e r a n c e , "X", has t h e
c o h e r e n t i n t e r p r e t a t i o n of choosing X instead of Y.
In o t h e r w o r d s , this e x t e n d e d notion of focus is useful in the same w a y as the m o r e u s u a l
o n e in t h e r e s o l u t i o n of anaphora, but unlike the other one, can also be used in the r e s o l u t i o n
of ellipsis.
T h i s latter use of focus appears to be novel, but see also the w o r k of L e v i n a n d
M o o r e [25].
potential
Of c o u r s e , it could be argued that the two aspects of focus - as a c o l l e c t i o n of
anaphoric
information
-
are
referents,
separate
and
as
a
goal
and should not
with
the
potential
for
providing
be dealt with in related w a y s .
ellipsed
We
believe,
h o w e v e r , that the t w o aspects are sufficiently intertwined for it to b e p r o f i t a b l e t o
treat
t h e m t h r o u g h a common mechanism.
A t this point, some c o n c r e t e examples will make it clearer how focus, as w e a r e
proposing
it, c a n h e l p w i t h a n a p h o r a and ellipsis.
U:
T d like the number for Mr. Smith.
S:
Do y o u mean Bill Smith or George Smith?
U:
I mean Bill.
H e r e t h e n e w u s e r r e f e r s anaphorically to Bill Smith by use of the first name. T h i s r e f e r e n c e
is e a s y t o r e s o l v e , h o w e v e r , because the focus is o n the user g e n e r a t e d s u b g o a l of filling t h e
p e r s o n slot of the d i r e c t o r y assistance frame, and more particularly o n the s y s t e m g e n e r a t e d
s u b g o a l of g e t t i n g the user to distinguish b e t w e e n two alternatives; this s u b g o a l i n v o l v e s t w o
e n t i t i e s a n d the r e f e r e n c e can easily be resolved to one of them.
A similar r e s o l u t i o n c o u l d
b e m a d e if t h e user had, for instance, said, "I mean the first one." Note that this
depends
resolution
u p o n the fact that the user's utterance does not change the c u r r e n t s u b g o a l
and
h e n c e t h e c u r r e n t focus.
If w e further change the example so that the user's r e p l y is j u s t
"Bill", or
this ellipsis can easily be resolved by assuming lhat the u s e r
"The
first
one
K
c o o p e r a t i n g in the f o c u s e d subgoal of choosing b e t w e e n the alternatives, so that the
m e n t i o n e d actually r e p r e s e n t s the user's choice.
is
object
Note that there is no n e e d to r e c o n s t r u c t a
p h r a s e s u c h as "I mean Bill", (or "I think it's Bill, "Bill is the one I want the number f o r " , etc.);
it is e n o u g h to i n t e r p r e t the referent as fulfilling the system's subgoal of g e t t i n g the u s e r t o
i n d i c a t e a c h o i c e (see Section 3.2).
C o n s i d e r also the f o l l o w i n g extract from the extended example of Section 9.2, in w h i c h t h e
t i m e o f a r e s t a u r a n t r e s e r v a t i o n is under discussion.
46
S:
I can't give y o u eight o'clock;
it would have to be s e v e n or
a f t e r nine thirty.
U: Y o u don't have anything b e t w e e n those times?
S: N o ! I'm afraid w e don't.
U: W e l l , the later time then.
The
system's
" i t " in the first line r e f e r s to the time of the r e s e r v a t i o n r a t h e r
o ' c l o c k ; this c o r r e s p o n d s to the focus of filling the time slot.
the
g o a l of
than
The system's u t t e r a n c e
eight
refines
filling this slot into a subgoal involving a choice on the part of the u s e r ,
and
" t h o s e t i m e s " in the user's r e p l y has to be interpreted in this context, since in p a r t i c u l a r , t h e
referenced
set
of
times
does
not
include
eight
o'clock.
The
user's
"anything"
is
less
s t r a i g h t f o r w a r d , h o w e v e r ; if a referent had to be found for it, a non-specific r e f e r e n c e t o a
t i m e w o u l d p r o b a b l y be best, but it is probably preferable to interpret the u t t e r a n c e as a
whole
as a s k i n g the s y s t e m to confirm its restriction o n times, without t r y i n g to
interpret
" a n y t h i n g " individually.
It thus sets up a subgoal on the part of the user, in terms of w h i c h
the
reply
system's
elliptical
is
easy
to
interpret.
However,
the
focus
on
the
choice
e s t a b l i s h e d in the system's first utterance is still available, and must be u s e d to i n t e r p r e t t h e
e l l i p s i s a n d a n a p h o r a in the user's second utterance.
The
i m p o r t a n c e of focus, then, is related to the commonness of a n a p h o r a a n d e l l i p s i s
n a t u r a l l a n g u a g e use, and they are v e r y common indeed.
in
Both allow a s p e a k e r t o e c o n o m i z e
o n w h a t he s a y s b y omitting unnecessary information that his listener can fill in b e c a u s e t h e i r
a t t e n t i o n is f o c u s e d o n the same common ground.
Since the user of a g r a c e f u l l y
interacting
s y s t e m w i l l n a t u r a l l y economize in this way, it is vital for the system to b e able t o u s e its
a w a r e n e s s of the conversational focus to make good the omitted information.
In
summary,
for
simple
service
systems
it
is convenient
to equate
the
focus
of
the
c o n v e r s a t i o n w i t h the subgoal that the user and the system are c u r r e n t l y c o o p e r a t i n g o n .
A n a p h o r i c r e f e r e n c e s can o f t e n be resolved into objects involved in the s u b g o a l c u r r e n t l y in
f o c u s , a n d e l l i p s i s c a n b e r e s o l v e d by assuming that the elliptical utterance is f u l f i l l i n g
the
f o c u s e d subgoal.
6.4. Keeping track of focus
S i n c e f o c u s is so important in the interpretation of ellipsis and anaphora, and s i n c e t h e s e
p h e n o m e n a a r e so commonplace, it is v e r y important for a gracefully interacting s y s t e m
k e e p t r a c k of the f o c u s as it shifts during the course of a conversation.
to
It is e a s y e n o u g h f o r
a s y s t e m t o tell w h e n the focus changes as a result of some n e w subgoal it e s t a b l i s h e s , b u t
w h a t a b o u t w h e n the user changes subgoals and therefore focus?
T h e p r o b l e m is a v e r y difficult one in general, and e v e n for simple s e r v i c e s y s t e m s , w e c a n
47
o n f y s u g g e s t a d i r e c t i o n for experimentation.
focus
The main problems are the u n p r e d i c t a b i l i t y
s h i f t s in terms of both w h e n they occur and what the new focus will b e .
As
Sidner
[ 3 7 ] o b s e r v e s , f o c u s shifts cannot be predicted in advance, they can only be d e t e c t e d
t h e y occur.
after
a
of
after
In the remainder of this section, w e will investigate what the n e w f o c u s c a n b e
shift
(the
answer
will
be
virtually
anything), and
suggest
some
approaches
to
d e t e c t i n g s u c h shifts.
A v e r y c o m m o n p a t t e r n of focus shift arises w h e n goals are r e f i n e d into s u b g o a l s , w h i c h in
t u r n s p a w n s u b g o a l s of their o w n . For example, in:
U: W h a t is the number for Bill Smith?
S: Did y o u s a y Bill or Phil?
U: Bill w i t h a ' B \
S: ' B ' . O.K. T h e number is 2597.
the
user
first
focuses
o n the p e r s o n parameter
(slot) of
the d i r e c t o r y
task (frame), t h e
s y s t e m t h e n c h a n g e s the focus to the first name of the p e r s o n and in particular to a c h o i c e
b e t w e e n t w o a l t e r n a t i v e s for that slot.
The user then cooperates w i t h the s y s t e m ' s g o a l a n d
c h o o s e s o n e of the alternatives, but in doing so refines the focus yet again to the f i r s t l e t t e r
of t h e f i r s t name.
In its r e p l y , the system first maintains the focus b y a c k n o w l e d g i n g
the
l e t t e r , a n d t h e n shifts the focus drastically by providing the user with the number he d e s i r e d .
T h e last f o c u s shift can be thought of as popping the nested set of foci that had b e e n built
up
t h r o u g h the p r e c e d i n g utterances, and then moving on to the next task - in this
performing
through
the
r e q u e s t e d service since its parameters are
now complete.
Popping
case
back
all the n e s t e d levels and going on to the next major subgoal is p r o b a b l y t h e most
c o m m o n w a y of terminating such nesting, but partial popping can also occur:
U: W h a t is the number for Mr. Smith?
S: Do y o u mean Bill Smith or Joe Smith?
U: I mean Jim Smythe.
H e r e t h e u s e r d o e s not c o o p e r a t e with the system's subgoal of distinguishing b e t w e e n
two
a l t e r n a t i v e s , but instead p o p s back to the previous level and retries his goal of s p e c i f y i n g t h e
p e r s o n slot to the s y s t e m .
Focus
user
can
s h i f t s d o not, h o w e v e r , always follow this nice stack discipline.
at
any
time
reopen
a focus
that was apparently
For i n s t a n c e , t h e
closed, either
as
a result
s o m e t h i n g the s y s t e m tells him which might make such a shift more p r e d i c t a b l e , as in:
U: I w o u l d like a r e s e r v a t i o n for two tonight.
S: W h a t time w o u l d that be for?
U: A b o u t s e v e n .
S: I'm a f r a i d the earliest I could give y o u is eight thirty.
U: W h a t about t o m o r r o w night?
of
48
o r f o r r e a s o n s b e s t k n o w n to himself:
U: I w o u l d like a r e s e r v a t i o n for a party of six tonight.
S: What time w o u l d that be for?
U: N o ! T h e r e will be s e v e n of us.
Even
worse,
the
user
can
shift
the
focus
to
subgoals
of
previously
open
foci,
which
t h e m s e l v e s h a v e n e v e r b e e n open, as in:
U: What is the number for Bill Smith?
S: T h e number for Bill Smith is 2597.
U: N o ! I s a i d Phil w i t h a *P* as in parrot.
w h e r e t h e u s e r not only r e o p e n s the focus on specifying the p e r s o n , but jumps immediately
i n t o s p e c i f y i n g the first name slot.
Note that the specification for the surname remains
the
s a m e , w h i c h is a d d e d e v i d e n c e that the old focus of filling the p e r s o n slot is r e o p e n e d
and
then refined.
T h e s e e x a m p l e s suggest that at any time there is a tree of potential f o c i , Le,
subgoals
w h i c h may b e c o m e the current focus; for a simple service system, the root of the t r e e w o u l d
b e t h e t o p - l e v e l goal of getting the service performed, the next level w o u l d b e the g o a l s of
s p e c i f y i n g the main parameters or slots of the service, the next level, s p e c i f y i n g t h e i r
etc
slots,
T h e s t a c k - l i k e p a t t e r n of focus shifts, described above as the most common t y p e , t h e n
c o r r e s p o n d s to d e p t h - f i r s t shifts d o w n one branch of the tree, f o l l o w e d b y a r e t u r n u p t h e
branch
and
a
transfer
to
the
highest
node
of
another
branch.
Reopening
of
old
foci
c o r r e s p o n d s to visiting a node in the tree more than once, and shifting f o c u s to a- s u b g o a l of
a p r e v i o u s l y o p e n focus, w h i c h itself has never been o p e n , c o r r e s p o n d s to visiting f o r
the
f i r s t time a n o d e w h o s e parent has been visited before.
In t h e
examples
above, the shifts across branches always involved shifts to
completely
d i f f e r e n t b r a n c h e s , since they involved changes in which top-level parameter of the
was being considered.
service
Shifts can also occur across branches which meet b e l o w t h e r o o t of
t h e t r e e , as the f o l l o w i n g example shows.
U: What is the number for Mr. Smith?
S: Do y o u mean Joe Smith or F r e d Smith?
U: I mean the Smith o n the eighth floor.
H e r e t h e u s e r d o e s not c o o p e r a t e with the system's subgoal of distinguishing b e t w e e n
the
S m i t h s b y f i r s t name, but instead chooses to distinguish them b y location, a s i b l i n g s u b g o a l o f
t h e g o a l of filling the p e r s o n slot.
F i n a l l y , a u s e r c a n always completely abandon his top-level goal, and thus move t h e f o c u s
e n t i r e l y out of the t r e e , as in:
49
U:
I w o u l d like a r e s e r v a t i o n for two this evening.
S:
It w o u l d have to be after ten.
U:
I'll t r y s o m e w h e r e else.
N o t e a l s o that in this example the user covers two subgoals in his first utterance; the s y s t e m
must b e p r e p a r e d to deal w i t h problems in one of them without forgetting the o t h e r .
T h e net c o n c l u s i o n from all these examples is that while there are c e r t a i n common t y p e s o f
focus
shift,
which
follow
a
stack
discipline
corresponding
to
progress
through
the
s p e c i f i c a t i o n of the s e r v i c e and the several levels of detail that can be i n v o l v e d , f o c u s
shift
to virtually
can
any part of the tree of potential foci, including shifts to o l d , a p p a r e n t l y
c l o s e d , f o c i o r t o s u b p a r t s of such old foci that have never themselves b e e n o p e n .
P r e v i o u s w o r k o n k e e p i n g track of focus has been based on more r e s t r i c t i v e
assumptions.
M o d u l o t h e i r d i f f e r e n t v i e w s of focus, both Grosz and Sidner have assumed that f o c u s s h i f t s
i n a q u i t e o r d e r l y w a y t h r o u g h a tree of potential foci; in particular, t h e y assumed that
no
n o d e o r b r a n c h of the t r e e was visited more than once, and in some cases, that the b r a n c h e s
o f t h e t r e e w e r e t r a v e r s e d in a partially fixed order.
T h e y then d e t e c t e d f o c u s shift
from
m e n t i o n of e n t i t i e s that belong in foci either further d o w n or further up the same b r a n c h of
t h e t r e e as the c u r r e n t focus, or in a focus on one of the permissible next b r a n c h e s .
have
We
n o immediate solution for the detection of more general t y p e s of focus shift; a g o o d
s t a r t i n g point might be, in the same way as Grosz and Sidner, to s e a r c h for a p p r o p r i a t e
foci
w h e n t h e e n t i t i e s or goals mentioned do not fit in the current focus, but to s e a r c h o v e r
the
e n t i r e t r e e of p o s s i b l e foci for the task domain.
In addition, since old foci can b e
reopened
a n d s p e c i f i c o b j e c t s associated with them reused, the relevant information must b e a s s o c i a t e d
w i t h t h e n o d e s that have already b e e n traversed.
6.5. Summary of Goals and Focus
W e c a n summarize the main points made in this section as follows:
-
O r d i n a r y human c o n v e r s a t i o n is highly goal oriented; people almost a l w a y s
p u r s u i n g o n e or more goals in saying what they say.
are
-
A g r a c e f u l l y interacting simple service system need have no high-level goals of
its o w n ; it merely c o o p e r a t e s with the user in fulfilling his.
-
T h e o n l y goals such a system need recognize in its user are those it c a n h e l p
s a t i s f y ; this p r e c l u d e s the need for a sophisticated model of t o p - l e v e l goals.
-
S u b g o a l s must be treated with greater care; they can be n e s t e d a r b i t r a r i l y
d e e p l y and originate from both the system and the user, but the s y s t e m
g e n e r a t e d s u b g o a l s are always derived from and subservient to t h o s e of t h e
user.
50
C o m m o n t y p e s of subgoal concern the specification of parameters of the s e r v i c e ,
o r o n e of their subslots; others seek information useful for establishing s u c h
specifications.
P e o p l e use a s h a r e d v i e w of what a conversation is about, called the f o c u s of t h e
c o n v e r s a t i o n , to economize substantially on what they say t h r o u g h a n a p h o r a a n d
ellipsis.
F o r a simple s e r v i c e system, viewing focus as the currently active s u b g o a l h e l p s
in t h e r e s o l u t i o n of b o t h anaphora and ellipsis.
A l t h o u g h t h e r e are some common patterns of focus shifts, the f o c u s of a
c o n v e r s a t i o n has the potential to shift unpredictably to any of a t r e e of p o t e n t i a l
foci.
7. Identification from Descriptions
7.1. The Identification Problem
A
fundamental
component
of
human conversation is the ability of
a listener
to
use
a
s p e a k e r ' s d e s c r i p t i o n of a previously memorized entity (object or event) to i d e n t i f y and r e c a l l
the entity.
If a definite focus of attention has already been established in the c o n v e r s a t i o n ,
t h e n as w e s a w in Section 6.3, descriptions can be quite cryptic or anaphoric (the w o m a n ; t h e
s e c o n d o n e ; it). On the other hand, if no context has been established, or if the o b j e c t t o b e
d e s c r i b e d is not in the established context, much fuller forms of descriptions must b e
(the
camping
trip w e
went
used
on last Easter; a w o r d beginning with ' 0 ' meaning f a w n i n g
s e r v i l e ; t h e lady that I s a w y o u with last night).
or
One of the more remarkable f e a t u r e s
h u m a n m e m o r y is the ability to identify from a suitable description virtually any
of
previously
e x p e r i e n c e d or k^own object or event (that man we met at the s e c o n d hotel w e s t a y e d at
w h e n w e w e n t o n vacation last year).
W h e n he d e s c r i b e s an entity, a speaker aims to identify the entity uniquely to his l i s t e n e r .
To
do
this,
he
uses
his
beliefs
about
the listener's
state of
knowledge
to
construct
a
d e s c r i p t i o n that is informative enough to identify the object uniquely to the listener ( p o s s i b l y
w i t h r e s p e c t to a c u r r e n t context).
Most usually he succeeds, but sometimes a l i s t e n e r
will
r e c a l l m o r e t h a n o n e entity fitting the description, or may not be able to i d e n t i f y a n y e n t i t y
filling the description.
In other words, there are three possible outcomes to any attempt t o
i d e n t i f y an e n t i t y t h r o u g h a description:
u n i q u e - o n l y one entity fits the description
a m b i g u o u s - more than one entity fits the description
u n s a t i s f i a b l e - no e n t i t y fits the description
51
In a c c o r d a n c e w i t h the Principle of Implicit Confirmation (Section 2.2), a s p e a k e r
that
assumes
his communication is successful, and in particular that his d e s c r i p t i o n s i d e n t i f y
e n t i t i e s f o r his listener, unless the listener indicates otherwise.
unique
The listener thus n e e d
do
n o t h i n g if a d e s c r i p t i o n falls into the unique case, but must indicate a p r o b l e m to the o r i g i n a l
speaker
will
in the ambiguous or unsatisfiable cases.
usually
In indicating such a p r o b l e m , the
listener
initiate a dialogue during which the description is r e v i s e d to s p e c i f y a u n i q u e
e n t i t y t o the listener.
Identification
from
descriptions
is
clearly
very
important
for
a gracefully
interactions
s y s t e m w h i c h must identify entities from its user's descriptions, and must in t u r n
t h i n g s to its u s e r .
describe
A d i r e c t o r y assistance must be able to accept d e s c r i p t i o n s of its l i s t i n g s ,
a n d in t u r n d e s c r i b e the c o r r e s p o n d i n g numbers to its user; and a restaurant
reservation
s y s t e m must b e able to accept and give out descriptions of times, dates, p a r t y s i z e s , etc..
F o r t u n a t e l y , identification from descriptions for gracefully interacting s y s t e m s , d o e s
not
i n v o l v e the q u i t e a w e - i n s p i r i n g range of description and recall found in human c o n v e r s a t i o n .
In s i m p l e
s e r v i c e systems, in particular, entities that the system must r e c o g n i z e
from
the
u s e r ' s d e s c r i p t i o n s are essentially the entities that serve as parameters to the s e r v i c e , a n d
these
typically
fall
into
a few
highly
restricted categories
such as listings
for
directory
a s s i s t a n c e , times, dates, p a r t y sizes, and names for restaurant r e s e r v a t i o n s , and m a i l b o x e s ,
h o s t s , a n d m e s s a g e s for electronic mail systems.
O n c e a d e s c r i p t i o n has b e e n obtained from a user, identification as an e n t i t y k n o w n to t h e
system
member
is also r e l a t i v e l y simple; since each parameter
of a finite class k n o w n to the system.
to the service t y p i c a l l y
must
be
a
A directory assistance s y s t e m , f o r e x a m p l e ,
w o u l d k n o w the set of all listings that it could consult, and a restaurant r e s e r v a t i o n s s y s t e m ,
the
d a t e , time, and p a r t y size must be a member of a finite set.
T h e name in w h i c h
a
r e s e r v a t i o n is made is an exception to this rule.
E v e n t h o u g h the range of entities that can be described, and the method of
identifying
e n t i t i e s f r o m their d e s c r i p t i o n s is much simpler than in general human c o n v e r s a t i o n , t h e
results
of
attempting identification from descriptions can be classified into the same
categories:
unique, ambiguous, and unsatisfiable.
A
gracefully
interacting
end
three
system
must,
t h e r e f o r e , b e able to participate o n either side of a dialogue for c l a r i f y i n g u n s a t i s f i a b l e
a m b i g u o u s d e s c r i p t i o n s ; these dialogues will be the subject of the following s u b s e c t i o n s .
or
Our
e x a m p l e s and analysis in these subsections will be in terms of simple s e r v i c e domains, a n d n o
claims
are
protocols
made
of
beyond
this, Many of
the strategies
presented
are, h o w e v e r , b a s e d
human conversations, and have much relevance to broader c l a s s e s of
communication.
We
know
of
no other
work
that has attempted to analyse
this
on
human
type
of
52
clarification dialogue.
7.2. A m b i g u o u s Descriptions
When
a speaker
interpret
describes
an entity to a listener
in such a w a y
that the l i s t e n e r
can
the d e s c r i p t i o n as r e f e r r i n g to more than one entity, the d e s c r i p t i o n is s a i d t o b e
ambiguous.
In o r d e r for the speaker's attempt at description to s u c c e e d in s u c h a s i t u a t i o n ,
a n c l a r i f y i n g dialogue must ensue to resolve the ambiguity.
strategies
involved
in s u c h
In this section w e i n v e s t i g a t e t h e
a resolution, restricting ourselves
to the
case
in
which
the
T h e s i m p l e s t s t r a t e g y for ambiguous descriptions is to inform the user of the o p t i o n s
and
d e s c r i p t i o n s are g i v e n to a gracefully interacting simple service system b y its user.
a s k him t o make a d e c i s i o n b e t w e e n them as in:
U: What is the number for Smith?
S:
Do y o u mean Jim Smith or Joe Smith?
N o t e that in distinguishing the options by the first name of the p e r s o n , the s y s t e m r e f i n e s t h e .
f o c u s of t h e c o n v e r s a t i o n to the corresponding slot in the description of a p e r s o n .
S o m e t i m e s the alternatives may be too many to list and in this case the s y s t e m c a n fix o n a
d i s t i n g u i s h i n g f e a t u r e of the alternatives and ask the user about it.
U: What is the number for Smith?
S: T h e r e are 5 9 Smiths listed. Do y o u know the first name?
W h i l e u n a v o i d a b l e in this case, this strategy has the disadvantage of inviting a w i d e r
o f r e p l y t h a n the s t r a t e g y of explicitly listing the alternatives.
variety
In g e n e r a l , some m e t h o d f o r
w e i g h i n g the r e l a t i v e advantages of the various strategies of clarification a c c o r d i n g t o
p r e v a i l i n g c i r c u m s t a n c e s may be needed.
distinguishing
the
Both strategies, however, change the f o c u s t o b e a
aspect (slot) of the entities fitting the original ambiguous d e s c r i p t i o n .
This
f o c u s s h i f t s e r v e s as the basis for understanding a continuation of:
U: I think it starts w i t h a P ...
H
M
t o t h e a b o v e example.
T h e r e is, of c o u r s e , no guarantee that the user will choose to follow the s y s t e m ' s shift of
focus.
He may i n s t e a d t r y to resolve the ambiguity by giving distinguishing i n f o r m a t i o n of his
o w n c h o o s i n g ; in particular, b y further specifying an aspect of the entity d i f f e r e n t f r o m t h e
o n e c h o s e n b y the s y s t e m .
U:
I don't k n o w the first name, but the address is Willow Crescent.
T h i s c o r r e s p o n d s to a shift of focus to a sibling node in the tree of p o s s i b l e f o c i .
In a n y
c a s e , t h e s y s t e m must be able to take whatever distinguishing information the u s e r g i v e s it
53
a n d a p p l y it to the r e s o l u t i o n of the ambiguity.
If the ambiguity is still not r e s o l v e d , y e t m o r e
i n t e r a c t i o n must take place w i t h the user.
Instead
of
trying
to
resolve
the
ambiguity,
the
user
may
also
change
the
original
description, giving
U:
as
a
No!! I meant Smythe.
possible
c o m p l e t e l y , or
continuation
reiterate
the
for
the
last
example,
abandon
ambiguous description.
the
These cases
attempt
at
identification
must
all b e
recognized
for
ambiguous
s e p a r a t e l y , a n d a p p r o p r i a t e responses given.
Occasionally,
for
domain dependent
reasons, one of
the
d e s c r i p t i o n may be much more likely than any of the others.
alternatives
an
In such cases, the a l t e r n a t i v e
m a y b e o f f e r e d as a y e s / n o option to the user without r e f e r r i n g to the o t h e r s .
Suppose
r e s e r v a t i o n s a r e available only after 9:30:
U: P d like a r e s e r v a t i o n for two tonight at 8:30 or later.
S: W o u l d 9:30 b e OK?
E v e n t h o u g h the user's specification c o v e r e d 8:30 and all later times, 9:30 is the time
t h o s e a v a i l a b l e that the user is most likely to prefer.
of
A similar s t r a t e g y can b e f o l l o w e d if
t h e u s e r is u n l i k e l y to be c o n c e r n e d about which of the alternatives is c h o s e n .
There
may
also
be
r e f e r e n t o v e r another.
chose
an
system,
Again,
alternative
for
user
dependent
reasons for
of
one
alternative
If the system has previous experience of a particular user, it might
w h i c h it knows the user normally p r e f e r s .
instance, might have knowledge of
knowledge
the system to p r e f e r
the current
b e t w e e n t w o alternatives.
goals of
the time a client
a user
might
provide
A restaurant
usually
reservation
prefers
some clue
to
to
his
dine.
choice
If a restaurant system knew, for instance, that the user w i s h e d t o
d i n e a f t e r the t h e a t r e , it might select an alternative reservation time that c o n f o r m e d t o that
constraint.
W e have not, h o w e v e r , considered the significant problems i n v o l v e d in u s i n g a n d
a c q u i r i n g i d i o s y n c r a t i c information, or in modelling and using an o p e n - e n d e d set of r e a l - w o r l d
goals.
7.3. Unsatisfiable Descriptions
In this s e c t i o n w e consider descriptions from which a listener cannot identify any e n t i t y h e
knows
about.
unsatisfiability
description.
The
as
basic
well
as
human
strategy
indicating
any
in such
cases
near-misses,
seems
i.e. entities
to
be
that
to
indicate
"almost"
fit
the
the
A s b e f o r e w e will discuss unsatisfiable descriptions assuming that t h e l i s t e n e r is
a g r a c e f u l l y i n t e r a c t i n g simple service system and the speaker is its user.
N l V l R S
CARNlGK-MtUOM «
"*
PITTSBURGH. PENNSYLVANIA 15211;
54
When
a user's
description
is unsatisfiable
by
any entity
consistent
with
the
system's
c o n s t r a i n t s , the basic s t r a t e g y must be to inform the user of the problem and e x p e c t him t o
e i t h e r c h a n g e the d e s c r i p t i o n so that it is satisfiable or give up.
should
provide
unsatisfiable.
In doing this, the
system
the user w i t h as complete a picture as possible of w h y the d e s c r i p t i o n
This
information will help the user
is
in reformulating his d e s c r i p t i o n , a n d , i n
p a r t i c u l a r , h e l p him avoid giving another description that is faulty.
U: What is the number for Mr. Jim Smith?
S: I'm s o r r y ! T h e r e is no listing for any Smith with initial 7 .
U:
S:
I'd like a r e s e r v a t i o n for ten people tonight.
I'm s o r r y ! w e can't accommodate parties larger than seven..
U:
S:
I'd like a r e s e r v a t i o n for two for tonight.
I'm a f r a i d the earliest reservation is next Thursday.
Implicit
which
in t h e s e explanations of unsatisfiability is the search for
nearly
fit
the
description.
It is considered
helpful
for
"near-misses",
humans
n e a r - m i s s e s , a n d a g r a c e f u l l y interacting system should do the same.
to
entities
suggest
Of c o u r s e , a n y
such
metric
f o r m e a s u r i n g w h e t h e r an entity is a near-miss must be quite domain d e p e n d e n t .
The metric
may
goals
also be dependent
distantly
related
to
on the idiosyncracies of a particular
the
domain of
user, or o n user
the system, in much the same w a y
as the
only
automatic
s e l e c t i o n b e t w e e n alternative possibilites for ambiguous descriptions d i s c u s s e d at t h e e n d o f
t h e last s e c t i o n .
A s also mentioned there, w e have not attempted to address the f o r m i d a b l e
p r o b l e m s that t h e s e topics raise.
Near-misses
distant
may
near-misses).
be
either
They
unique
(one
near-miss) or
can be treated similarly
ambiguous
to the
basic
(several
unique
and
comparably
ambiguous
d e s c r i p t i o n c a s e s , e x c e p t that it must be made plain that the d e s c r i p t i o n w a s s a t i s f i e d b y n o
e n t i t y as it s t o o d .
In addition, the unique case must be p r e s e n t e d in a y e s / n o q u e s t i o n t o t h e
u s e r r a t h e r t h a n assumed.
U: W h a t is the number for Joe Smith?
S: W e don't have a listing for Joe Smith. Do y o u mean Jim Smith?
U: What is the number for Joe Smith?
S: W e don't have a listing for Joe Smith. There's one for
Jim Smith or P e t e Smith or one for Joe Smythe.
In a d d i t i o n to the r e p l i e s to the basic ambiguous case discussed above, the n e a r - m i s s
case
c a n r e s u l t in r e p l i e s in w h i c h the user claims that the near-miss was what he s a i d in the f i r s t
p l a c e , o r in w h i c h he r e p e a t s (possibly in the negative, "but there isn't any J o e Smith?") t h e
original description.
55
7.4. D e s c r i p t i o n s and Faulty Comprehension
In d i s c u s s i n g the r e p l i e s that a gracefully interacting system should make to a m b i g u o u s o r
u n s a t i s f i a b l e d e s c r i p t i o n s , w e have assumed that the system c o m p r e h e n d e d the
b o t h fully and accurately.
descriptions
A partially or uncertainly comprehended d e s c r i p t i o n c a n , of c o u r s e ,
b e c l a s s i f i e d in the same w a y as unique, ambiguous, or unsatisfiable, but the w a y in w h i c h t h e
system
reacts
to
comprehension.
such
We
descriptions
can
distinguish
should
the
be
two
modified
in
subcases
of
the
presence
incomplete
of
and
faulty
uncertain
comprehension.
If t h e s y s t e m ' s c o m p r e h e n s i o n of a description is incomplete, so that some a s p e c t s of
description
ought
not
are
to
c o m p r e h e n d e d , while others are not, then the system's
be
modified depending on the classification of
the
reaction ought
the d e s c r i p t i o n .
If
or
such
d e s c r i p t i o n is ambiguous and the missing part of the description might p o s s i b l y r e s o l v e
a
the
a m b i g u i t y , t h e n the s y s t e m should assume that it would, and ask the user a s u i t a b l e c l a r i f y i n g
question.
U: W h a t is the number for <GARBLE> Smith?
S: Did y o u s a y Jim Smith or Joe Smith?
If, o n t h e o t h e r hand, the uncomprehended part of the description could not d i s t i n g u i s h t h e
two
a l t e r n a t i v e s , t h e n the description should be treated in the same w a y
ambiguous description.
as an
ordinary
F o r example, if there is a Jim Smith and a J o e Smith, b o t h w i t h t h e
s a m e title, t h e n the f o l l o w i n g sequence is appropriate.
U: What is the number for <GARBLE> J . Smith?
S: Do y o u mean P r o f e s s o r Jim Smith or Professor Joe Smith?
In the
further
the
same
w a y , if
we
assume
that uncomprehended
parts of
a description
can
only
r e s t r i c t the class of entities to which a description could refer, t h e r e is no p o i n t in
system's
trying
to
determine
the
uncomprehended
part
of
a
description
whose
c o m p r e h e n d e d part is already unique or unsatisfiable; for instance, there is no point in t r y i n g
t o d e t e r m i n e the g a r b l e d part of "Professor <GARBLE> Smith" if only one P r o f e s s o r Smith is
known
to
the
system.
Of
course,
there
are
exceptional
cases
(negatively
phrased
d e s c r i p t i o n s ) in w h i c h this assumption does not hold, but normally it will allow the s y s t e m t o
i g n o r e t h e u n c o m p r e h e n d e d part without ill-effect (see Section 2.4).
Now
we
turn
interpretable;
the
to
the
uncertainty
possible interpretations.
on
the
case
classification
of
in
can
which
be
a
about
description
one
is
completely,
interpretation
or
be
but
uncertainly,
between
several
The action a system should take in such cases d e p e n d s v e r y
the
uncertain
interpretations
as
either
unique,
much
ambiguous,
or
56
unsatisfiable.
unsatisfiable,
If o n e of
then
this
the alternatives is unique, while the others
interpretation should be raised in certainty
are all a m b i g u o u s
(on the g r o u n d s
or
that
p e o p l e t r y to g i v e unique descriptions), possibly to the extent that the s y s t e m w o u l d w a n t t o
a c c e p t this i n t e r p r e t a t i o n over ail the others (perhaps checking it w i t h an e c h o s e e
2.4).
In g e n e r a l , b e i n g
unique
will tend to raise a description
Section
in c e r t a i n t y , w h i l e
being
a m b i g u o u s w i l l t e n d to l o w e r it, and being unsatisfiable will tend to l o w e r it e v e n m o r e .
The
p i c t u r e c h a n g e s , h o w e v e r , if an unsatisfiable alternative has near-misses, e s p e c i a l l y if it has
just one near-miss.
In such a case it might be fruitful to re-analyse the input to s e e if t h e
a l t e r n a t i v e c o u l d be r e i n t e r p r e t e d as a direct description of the near-miss e n t i t y .
If it c o u l d ,
t h e n t h e r e w o u l d be reasonable grounds for assuming that it was the i n t e r p r e t a t i o n the u s e r
originally intended.
The p r o c e s s of mapping descriptions onto entities must thus b e
closely
c o o r d i n a t e d w i t h the p r o c e s s of analysing the descriptions linguistically.
7.5. Summary of Identification from Descriptions
W e c a n summarize the main points of this section as follows:
-
A f u n d a m e n t a l l y important important part of human conversational skills is
a b i l i t y of a listener to identify entities from a speaker's d e s c r i p t i o n of them.
the
-
In the g i v e n context, such descriptions may be uniquely s p e c i f y i n g , ambiguous, o r
u n s a t i s f i a b l e b y any entity known to the listener.
-
In t h e ambiguous case, the speaker should be informed of the alternatives o r
a s k e d f o r f u r t h e r distinguishing information, but there is no guarantee that he
w i l l c h o o s e to make the distinction in the w a y the listener suggests.
-
In t h e unsatisfiable case, the listener should mention "near-misses", entities that
almost fit the d e s c r i p t i o n .
-
If the d e s c r i p t i o n w a s only partially or uncertanly comprehended b y the l i s t e n e r ,
t h e s t r a t e g i e s he uses for ambiguous and unsatisfiable descriptions s h o u l d b e
m o d i f i e d in a number of detailed ways.
8. Language Generation
T h e e x t r a o r d i n a r y ability of humans to make sense out of the most c o n v o l u t e d and o b s c u r e
l a n g u a g e f o r m s makes language generation one of the less critical components of
interaction;
even
quite
clumsy
utterances
by
u n d e r s t o o d w i t h o u t great difficulty by its user.
a gracefully
interacting
system
Nevertheless, a user cannot
be
graceful
could
be
presumed
u p o n t o o much to compensate for a system's generative deficiencies; if a s y s t e m ' s u t t e r a n c e s
a r e t o o c l u m s y , t h e y w i l l not appear natural; if they are over-detailed o r too long w i n d e d , t h e
u s e r may b e c o m e f r u s t r a t e d ; in either case, the interaction will be less g r a c e f u l t h a n it c o u l d
57
be.
In t h e r e m a i n d e r of this section, we discuss briefly some of the c h a r a c t e r i s t i c s of
language
that
a gracefully
including
contextually
several
different
echoes
with
interacting
dependent
system should imitate
(anaphoric) references
in o r d e r
to
appear
and ellipsis, and the
output.
We
will
also
discuss
generation
natural,
inclusion
messages (illocutionary acts) in the same utterance, e s p e c i a l l y
other
human
considerations
of
combining
specific
g r a c e f u l l y i n t e r a c t i n g systems; in particular, the need to restrict the complexity of
to
generated
u t t e r a n c e s in a c c o r d a n c e w i t h analytical ability.
W h e n a human d e s c r i b e s some entity, he does not normally incorporate all that he
Knows
a b o u t t h e e n t i t y into the description; rather, as Grice [12] has pointed out, he n o r m a l l y u s e s
s u f f i c i e n t i n f o r m a t i o n to identify the entity to his listener, but no more.
T h u s , if a s p e a k e r
w a n t e d t o d e s c r i b e a black desk, he might just say "the desk" if he thought that d e s c r i p t i o n
w a s s u f f i c i e n t f o r his listener to understand what he meant.
On the other hand, if he t h o u g h t
t h e r e w a s some danger of confusion with a different desk (a b r o w n one, say), he might s a y
"the black desk".
T h e w a y an entity is described in human conversation, then, d e p e n d s
not
o n l y o n t h e e n t i t y and the speaker's knowledge of it, but also on the s p e a k e r ' s b e l i e f s a b o u t
his
listener's
stale
of
knowledge
and focus of
attention.
An
appropriate
description
is
t y p i c a l l y t h e least informative one that is sufficient to identify the d e s c r i b e d e n t i t y r e l a t i v e t o
t h e l i s t e n e r ' s c u r r e n t focus of attention.
research
to
generating
descriptions
Little attention has b e e n paid in natural
appropriate
in this sense, except
for
the
language
description
g e n e r a t i o n s y s t e m of L e v i n and Goldman [26],
The
problem
descriptions
conventions,
is,
nevertheless,
important
for
a
gracefully
interacting
system;
s u c h a s y s t e m uses to identify objects to its user do not c o n f o r m t o
the
user
will
find
them
unnatural
and
ungraceful.
Thus,
if
a
if
the
human
restaurant
r e s e r v a t i o n s s y s t e m w i s h e s to offer its user a reservation on Wednesday at 7pm, and if t h e
u s e r has a l r e a d y made it clear that he wants a reservation on W e d n e s d a y in the e v e n i n g , t h e
s y s t e m n e e d o n l y and should only describe the reservation by " 7 " (as in "Would 7 b e o k ? ) .
w
Using
these
conventions
leads
to
the
generation
of
anaphoric
references.
A
similar
c o n v e n t i o n (of not giving more information than one has to) can result in elliptical u t t e r a n c e s ,
e.g. r e p l y i n g "Jim Smith" instead of "I mean Jim Smith" in response to "Do y o u mean Jim Smith
or
Fred
Smith?".
It
is clearly
desirable
for
a gracefully
interacting
system
to
use
k n o w l e d g e of the goal s t r u c t u r e and focus of attention of a c o n v e r s a t i o n to g e n e r a t e
its
such
e l l i p s e s a n d a n a p h o r a as well as analyse them (see Section 6.3).
Humans
often
use
a single utterance to serve several different
purposes
(or
following
S e a r l e ' s t e r m i n o l o g y [36] to p e r f o r m several illocutionary acts). A p e r s o n might, f o r i n s t a n c e ,
58
a s k if a w i n d o w w a s o p e n , both to indicate that he was cold and to ask his l i s t e n e r t o s h u t
a n y o p e n w i n d o w , as well as to find out whether a window was indeed o p e n .
A t a m u c h l e s s s o p h i s t i c a t e d level, an ability to combine more than one i l l o c u t i o n a r y act in
a s i n g l e u t t e r a n c e is also useful for a gracefully interacting system.
In particular, s u c h an
a b i l i t y is u s e f u l for the kind of indirect echoing described in Section 2.4.
a
listener
who
is unsure
of
his interpretation of
u n c e r t a i n i n t e r p r e t a t i o n into his reply.
the
interpretation
is implicitly
a speaker's
In indirect e c h o i n g ,
utterance
incorporates
his
If the original speaker does not comment o n t h e e c h o ,
confirmed.
Thus, if a restaurant
reservations
system
was
u n s u r e a b o u t its r e c o g n i t i o n of " s e v e n " in:
I'd like a r e s e r v a t i o n for s e v e n people,
a suitable r e p l y would be:
W h a t time w o u l d y o u r party of s e v e n like to eat?
T h e s u b s t i t u t i o n of "your party of s e v e n " for the more normal " y o u " r e p r e s e n t s an i n d i r e c t
e c h o of t h e s y s t e m ' s uncertain interpretation.
Of course, using indirect e c h o e s c a n (and in
t h i s e x a m p l e did) violate the convention discussed above of using descriptions o n l y minimally
sufficient
for
identification.
However,
this
appears
to be
acceptable
p r e s u m a b l y b e c a u s e of the clarification purposes thus s e r v e d .
in human
dialogue,
Similarly, indirect e c h o e s c a n
a l s o r e p l a c e e l l i p s e s w i t h fuller forms, as in:
U:
I w a n t to go to P i t t s b u r g h on May 17.
S: What time o n M a y 17 do y o u want to go to Pittsburgh?
i n s t e a d of:
S: W h a t time do y o u want to go?
i n w h i c h t h e d a t e and destination are ellipsed.
B e f o r e e n d i n g this section, w e must consider an aspect of generation that is much m o r e
important
for
gracefully
interacting
systems
than
in
general
human
conversation.
As
m e n t i o n e d in S e c t i o n 2.3, the form of a speaker's utterance tends to influence the f o r m of t h e
listener's response.
limited
ability
to
Thus it is vitally important for a gracefully interacting s y s t e m w i t h a
comprehend
language
g e n e r a t e s w i t h its r e c e p t i v e r e p e r t o i r e .
to co-ordinate
the questions
and o t h e r
This co-ordination has t w o p a r t s :
output
it
first, a s y s t e m
s h o u l d t r y to r e s t r i c t the content of the user's reply by asking questions that are as s p e c i f i c
and
directive
as
possible.
To
repeat
an example
from Section
2.3, if
a system
cannot
u n d e r s t a n d " e x t e n s i o n " in "I would like the extension for Jim Smith", it s h o u l d ask "Do
w a n t t h e number for Jim Smith?", rather than "What do y o u mean by 'extension'?".
a
system
should
be
able to understand standard transformations
p h r a s i n g s in w h i c h it asks its questions.
you
Secondly,
and c o n v o l u t i o n s
of
the
Thus, if a system asks "Would y o u p r e f e r 7 o ' c l o c k
59
or
8 o'clock?", it s h o u l d certainly be able to understand "I p r e f e r
7 o'clock.", and
should
p r o b a b l y also u n d e r s t a n d convolutions as distant as 7 o'clock w o u l d be my p r e f e r e n c e . " .
H
In s u m m a r y , although a human user can be expected to compensate somewhat
for
the
d e f i c i e n c i e s of the output p r o d u c e d by a gracefully interacting system, the more n a t u r a l t h e
output
is, the more g r a c e f u l the system will appear to the user.
should
use
possible.
its
model
of
the
focus
H o w e v e r , sometimes
to generate
other
elliptical
In particular, the
and anaphoric
demands, particularly
those of
system
utterances
indirect
when
echoing,
can
o v e r r i d e this r e q u i r e m e n t for terseness and cause the information omitted from an u t t e r a n c e
b y e l l i p s i s and a n a p h o r a to be reintroduced.
Finally, a gracefully interacting s y s t e m s h o u l d
p r o d u c e o n l y o u t p u t that it can understand itself, in case it is transformed b y the u s e r
and
i n c o r p o r a t e d into his r e p l y .
9. Realizing Graceful Interaction
H a v i n g o u t l i n e d the basic components of graceful interaction, w e n o w t u r n to the q u e s t i o n
of
how
to
fit
the components
together
in a single gracefully
interacting s y s t e m .
f o l l o w i n g s u b s e c t i o n , w e give some design considerations for complete g r a c e f u l l y
the
interacting
s y s t e m s , a n d go o n to outline an architecture in line with these considerations f o r
i n t e r a c t i o n in simple s e r v i c e domains.
In
We are planning to implement a practical
graceful
gracefully
i n t e r a c t i n g i n t e r f a c e according to the p r o p o s e d architecture, but at the time of w r i t i n g h a v e
only
just
tentative
s t a r t e d to do this.
and subject
The
to revision.
architecture
proposed
must t h e r e f o r e
A second subsection follows
be
regarded
a hypothetical
i n t e r a c t i n g s y s t e m e m p l o y i n g the p r o p o s e d architecture through a realistic example
as
gracefully
dialogue
i n w h i c h many of the components of graceful interaction that w e have d i s c u s s e d c o m e i n t o
play.
9.1. A r c h i t e c t u r e of a Gracefully Interacting System
The
overall
important
d e s i g n of
characteristics
any
of
gracefully
the
interacting
interplay
between
system must
the
various
take
into
account
components
of
three
graceful
interaction.
-
A t a n y point in the t y p e of conversation w e are considering, the role of
g r a c e f u l l y interacting system can be described in terms of just one of
c o m p o n e n t s of g r a c e f u l interaction.
the
the
-
T h e s e v e r a l components of graceful interaction come into play in an i n h e r e n t l y
a s y n c h r o n o u s manner, in the sense that a system's use of one component may b e
i n t e r r u p t e d b y use of one of the others without the first coming to a
proper
conclusion.
60
-
C o m p o n e n t s i n t e r r u p t e d in this way can often be restarted from w h e r e t h e y left
o f f a f t e r the i n t e r r u p t i o n is over, but this is not inevitable.
C o n s i d e r , f o r example, a dialogue concerned with identification of some entity b y the s y s t e m
f r o m a u s e r ' s d e s c r i p t i o n . While this dialogue is taking place, the behaviour of the s y s t e m c a n
b e e x p l a i n e d p u r e l y in terms of the identification from description component.
time
this
line
dialogue, or
Section
of
conversation
can be
interrupted
in favour
of
(say) a
Y e t at
channel-checking
an e c h o c o r r e c t i o n , or e v e n an identification from a d i f f e r e n t d e s c r i p t i o n
6.4).
any
(see
M o r e o v e r , the original identification may or may not be taken u p a f t e r
the
i n t e r r u p t i o n is o v e r .
Because
the
different
components
of
graceful
interaction
can come
into
play
in
this
i n h e r e n t l y a s y n c h r o n o u s manner, we propose to organize our gracefully interacting s y s t e m as
a s e t o f a u t o n o m o u s modules, each capable of carrying on a particular t y p e of d i a l o g u e :
one
f o r c h a n n e l c h e c k i n g , one for identification from descriptions, etc..
one
At any time, e x a c t l y
o f t h e s e m o d u l e s w o u l d be active, in the sense that its state w o u l d hold the s y s t e m ' s p r i m a r y
v i e w of w h a t w a s h a p p e n i n g in the conversation.
However, the other modules w o u l d b e k e p t
i n a s t a t e of r e a d i n e s s , and if the user's next input could be dealt w i t h b e t t e r b y o n e of t h e
o t h e r m o d u l e s , it w o u l d be made the currently active module.
The state of a module u s u r p e d
in t h i s w a y w o u l d , of c o u r s e , be p r e s e r v e d in anticipation of a possible c o n t i n u a t i o n .
module
w o u l d indicate w h i c h inputs it was able to deal with b y
conditions;
the
inputs
it
can
deal
with
are
those
that
satisfy
Each
a set of e x p e c t a t i o n s
the
expectations.
e x p e c t a t i o n s c o u l d change according to what had gone before; thus, a d e s c r i p t i o n
or
These
identifier
m o d u l e w o u l d have d i f f e r e n t expectations after it had presented a list of alternative f i l l e r s o f
an
ambiguous
d e s c r i p t i o n , than
after
it
had
informed
the
user
that
a description
was
u n s a t i s f i a b l e ; again, a module for detecting echo corrections w o u l d o n l y have an e x p e c t a t i o n
at all a f t e r the s y s t e m had issued an echo.
W h a t modules are n e e d e d , and what are their functions?
We give h e r e our c u r r e n t
answer
f o r s i m p l e s e r v i c e systems; w e expect that most of the modules w o u l d stay the same f o r
g r a c e f u l l y i n t e r a c t i n g s y s t e m of greater abilities.
-
Channel maintainor:
This module can conduct a channel-checking dialogue.
It w i l l
i n i t i a t e o n e if the user fails to reply to the system, and will r e s p o n d to the u s e r ' s
i n i t i a t i o n of s u c h a dialogue.
-
U s e r ' s echo monitor: This module can conduct a dialogue to c o r r e c t i n c o r r e c t
e c h o e s b y the user of what the system said. It will detect any e c h o i n g b y the
u s e r and initiate a c o r r e c t i o n dialogue if the echo is incorrect.
-
Echo c o r r e c t i o n monitor: This module can detect any c o r r e c t i o n b y the user of
e c h o e s b y the system.
It will take part in a dialogue to e n s u r e that the
a
61
c o r r e c t i o n is u n d e r s t o o d p r o p e r l y .
-
Explicit incomprehension monitor: This module can engage in a dialogue to c l e a r
u p explicit complaints of complete or partial misunderstanding b y the user.
It
w i l l initiate its dialogue w h e n the user makes such an explicit complaint.
-
Incomprehension resolver:
This module is used w h e n e v e r the d e g r e e of
c o m p r e h e n s i o n of the user's input is unsatisfactory. According to the d e g r e e of
i n c o m p r e h e n s i o n and the importance of what was not understood, it can e c h o o r
r e q u e s t c l a r i f i c a t i o n explicitly, and participate in any clarifying dialogue.
(The
e c h o c o r r e c t i o n monitor could be included as part of this module, since it d e a l s
w i t h a logical continuation of echoing by the system).
-
D e s c r i p t i o n identifier: This module is used to identify entities from d e s c r i p t i o n s
b y t h e user. If n e c e s s a r y , it can request descriptions explicitly, and e n g a g e in
d i a l o g u e to c l a r i f y ambiguous or unsatisfiable descriptions.
-
A b i l i t y and goal describer:
This module can engage in a dialogue about the
s y s t e m ' s ability, either in response to direct questions, or as a last r e s o r t w h e n
i n c o m p r e h e n s i o n resolution does not appear to be working. It can also d e s c r i b e
t o the u s e r what it thinks the user is trying to do, how it is c o o p e r a t i n g w i t h
that, a n d w h a t the user needs to do to let it succeed; this information is available
t h r o u g h the focus of the system.
-
Initiator:
T h i s module is used to establish communication b e t w e e n the s y s t e m
a n d the user. It is always the first module activated. Besides e n s u r i n g that the
c o m m u n i c a t i o n channel is indeed open, it can conduct a dialogue about what t h e
s y s t e m is and can do; this allows the user to make certain that the s y s t e m is the
p a r t y he r e a l l y wants.
-
Terminator* This module is used w h e n the conversation needs to be t e r m i n a t e d .
It c a n initiate a termination w h e n the task has been completed, and can also b e
a c t i v a t e d w h e n the user decides to give up before finishing the task.
To
coordinate
all
these
modules, some
kind of
executive
is needed.
The
job
of
e x e c u t i v e is to direct the user's input to the appropriate module, and to make s u r e that
one
m o d u l e c o m p l e t e s its dialogue, control is turned over to the next a p p r o p r i a t e
F o r t h e s e r e a s o n s w e call this executive module the focus maintainer.
we
envisage
is
representation
more
than
one
tailored
directly
to simple service
as d i s c u s s e d in Section 4.3.
slot
by
The focus
systems, and assumes
a
module.
maintainer
It can handle input containing d e s c r i p t i o n s
parcelling the descriptions out
to fill; it c a n detect
when
frame-based
to appropriate
invocations
d e s c r i p t i o n i d e n t i f i e r module; it keeps track of which slot the user and s y s t e m are
trying
this
of
of
the
currently
any correction by the user of slots that have p r e v i o u s l y
f i l l e d ; it c a n initiate filling of a slot w h e n the user has not tried to fill it; it c a n d e a l
been
with
d e s c r i p t i o n s of s e v e r a l slots at once; it can decide when all slots have b e e n filled; and it c a n
k e e p t r a c k of the focus within a slot, i.e. which aspect of the entity d e s c r i b e d is b e i n g u s e d
t o r e s o l v e ambiguities.
62
P a r s i n g of
e a c h input will be done in accordance with the demands of
flexible
parsing
d i s c u s s e d in S e c t i o n 3.2; in particular, the parsing will be bottom-up, not s t r i c t l y d i r e c t i o n a l ,
a n d b a s e d o n pattern-matching.
Each module will parse its input s e p a r a t e l y (if this r e s u l t s in
t o o m u c h d u p l i c a t i o n of e f f o r t , partial results could be saved). This allows the e x p e c t a t i o n s o f
e a c h i n d i v i d u a l module to influence the parsing in two advantageous w a y s :
time
the
parser
secondly,
will only use patterns relevant to the module for w h i c h it is p a r s i n g ; a n d
expectations
of
specific
replies can be used to recognize
u t t e r a n c e s in themselves, as discussed in Sections 3.2 and 6.3.
from
the
user
first, at a n y g i v e n
that
satisfies
an
active
module's
expectations
ellipses
as
complete
Using these s t r a t e g i e s , i n p u t
can
be
recognized
with
a
minimum of s e a r c h .
H o w s h o u l d the s e v e r a l modules listed above be implemented?
e a c h of
Hearsay
them as a finite state network.
system
[19],
a
finite
state
One p o s s i b i l i t y is t o m o d e l
As was shown in the discourse c o m p o n e n t of
network
is
very
suitable
s t r a i g h t f o r w a r d c o n v e r s a t i o n in which nothing unexpected happens.
for
representing
the
a
The preceding dialogue
c a n b e r e p r e s e n t e d implicitly b y the state of a finite network plus p o s s i b l y the c o n t e n t s
of
s o m e r e g i s t e r s , and transition to other states can be made conditional on b o t h the input a n d
t h e c o n t e n t s of r e g i s t e r s , thus ensuring that the state reached b y a transition r e f l e c t s all t h e
r e l e v a n t h i s t o r y of the conversation, and enabling the system's r e s p o n s e to a u s e r ' s input t o
d e p e n d b o t h o n the input, and on what has gone before in the dialogue.
These
f e a t u r e s appear to make the finite state model extremely suitable for the
m o d u l e s d e s c r i b e d above, since each of them can, by definition, conduct
c o n v e r s a t i o n in w h i c h nothing unexpected happens.
several
a straightforward
Figure 1 shows a finite state net f o r a
s i m p l i f i e d d e s c r i p t i o n identifier module; the diamond boxes are tests and the s q u a r e b o x e s a r e
a c t i o n s , w h i l e the circles are states; transitions are made from state to state a c c o r d i n g t o t h e
c o n d i t i o n s , w h i c h t y p i c a l l y d e p e n d on the input, but in some cases also d e p e n d o n
(e.g. t h e test for one o p t i o n specified, after the ambiguous description state).
registers
It is e a s y t o
s e e f r o m t h e example that the expectations of a given module in a given state c o r r e s p o n d t o
t h e c o n d i t i o n s o n the arcs leading from that state.
Nevertheless,
disadvantages.
the
finite
These
state
model
for
the
individual
modules
suffers
from
certain
arise mainly because the implicit w a y in w h i c h state i n f o r m a t i o n
r e p r e s e n t e d in a finite state network.
is
In such a network it is natural to e n c o d e c u r r e n t g o a l s
a n d p a s t h i s t o r y implicitly in the state nodes.
In Figure 1, for example, the state " a m b i g u o u s
d e s c r i p t i o n " implicitly e n c o d e s the fact that the user has given the s y s t e m an
ambiguous
d e s c r i p t i o n , and the goal of the system to get the user to make that d e s c r i p t i o n u n a m b i g u o u s .
T h i s i m p l i c i t n e s s makes finite state networks relatively hard to modify, since the s t a t e implicit
63
Figure h Finite-state net for description identifier module
in a n o d e must in g e n e r a l be taken into consideration w h e n links to or from that n o d e
added or deleted.
M o r e serious perhaps is the problem that finite state modules w o u l d not
b e p e r s p i c u o u s to other modules, and in particular to the focus maintainer.
We believe that
c o m m u n i c a t i o n b e t w e e n modules about the goals that they are c u r r e n t l y p u r s u i n g may
very
i m p o r t a n t to a complete system.
It may be best to tackle the p r o b l e m of
b y r e p r e s e n t i n g e a c h module as an independent set of production rules.
possible
to
important.
are
compile
such
rule
sets
into finite state
networks
if e x t r a
modifiability
It s h o u l d still
efficiency
Communication b e t w e e n modules might be achieved in the context of
be
be
proved
independent
r u l e s e t s t h r o u g h s h a r e d global variables. On the other hand, it may b e n e c e s s a r y t o maintain
64
a s t a c k of s y s t e m and user goals In a w a y similar to the "dialogue games" model of L e v i n a n d
Moore
[25].
We
expect
to resolve
these questions
and many other
more d e t a i l e d
ones
t h r o u g h e x p e r i e n c e w i t h the implementation that w e are currently undertaking along t h e lines
presented
above
of
a
gracefully
interacting
system
for
composing, s e n d i n g , f i l i n g ,
and
r e t r e i v i n g e l e c t r o n i c mail.
9.2. W o r k e d Example
Below
we
list
a realistic
example of the sort
of dialogue
that w e
expect
a
gracefully
i n t e r a c t i n g s y s t e m to be able to participate in; The example was c o n s t r u c t e d to i l l u s t r a t e most
o f t h e c o m p o n e n t s of interaction that w e have discussed.
s i m p l e s e r v i c e s y s t e m for restaurant reservations.
The system in the e x a m p l e is a
We will follow t h r o u g h the d i a l o g u e
and
s e e h o w the s e v e r a l modules d e s c r i b e d above contribute to the system's p a r t i c i p a t i o n .
S:
U:
S:
U:
S:
Hello! T h i s is the B r o w n C o w restaurant.
H e l l o ! T h e B r o w n C o w restaurant?
Y e s ! W o u l d y o u like to make a reservation?
Y e s ! I'd like one for eight tonight.
A t eight o'clock, and for how many people?
5
U: T h e r e will b e (five) of us.
<clicks o n line>
U: Hello! Hello! A r e y o u there?
S: Y e s ! C a n y o u hear me?
U: Y e s I c a n .
S: OK. F o r t h r e e p e o p l e , and what name will that be under?
U: N o ! I s a i d f i v e .
S: O h ! F i v e p e o p l e . I can't give y o u eight o'clock; it would have to be s e v e n o r
U:
S:
U:
S:
U:
S:
U:
S:
10
a f t e r nine thirty.
Y o u don't have anything b e t w e e n those times?
N o ! I'm afraid w e don't.
W e l l , w e ' l l wait until the later time. Was it nine or nine thirty?
It w a s nine thirty. What name will that be under?
Teacon.
W a s that B e a c o n w i t h a 'B', or Deacon with a 'D'?
It's T e a c o n w i t h a 'T'; T E A C O N .
T E A C 0 N. V e r y g o o d ! We'll expect five of y o u at nine thirty.
15
20
U: OK. T h a n k y o u !
S: Y o u ' r e w e l c o m e ! B y e !
U: B y e !
The
first
t h r e e lines are dealt w i t h by the initiator module.
A c o n v e r s a t i o n is
always
s t a r t e d b y the activation of this module, and its first action is to give a g r e e t i n g w h i c h b o t h
i n f o r m s t h e u s e r that the channel of communication is o p e n , and tells him w h o he is t a l k i n g t o .
O n line 2, t h e u s e r asks for some reassurance that he is talking to the right p a r t y ; the m o d u l e
p r o v i d e s it, d e - a c t i v a t e s itself, and passes control back to the focus maintainer; S i n c e
focus
maintainer
has no input to w o r k on, it asks the user if he wants to u s e t h e
the
(only)
65
s e r v i c e that the s y s t e m can provide.
If, at
this
point, the user
had still not been satisfied with whom it was
i n i t i a t o r module w o u l d have b e e n re-activated.
taking to, t h e
Instead, the user satisfies the e x p e c t a t i o n of
t h e f o c u s maintainer b y s p e c i f y i n g a reservation.
We are assuming a f r a m e - b a s e d s y s t e m , s o
t h e f o c u s maintainer must now activate the description identifier module for e a c h slot c o v e r e d
b y the specification.
Since the two slot specifications are uniquely identifying, the i d e n t i f i e r
m o d u l e t e r m i n a t e s in b o t h cases without generating any output.
However, the p r i o r c h o i c e o f
w h i c h s l o t s to use w a s not straightforward: "tonight" specifies the date slot, but "for e i g h t "
c o u l d s p e c i f y e i t h e r the p a r t y size or the time. The focus maintainer has t w o o p t i o n s :
either
a s k t h e u s e r e x p l i c i t l y w h i c h one he meant (using the incomprehension r e s o l v e r module), o r
c h o o s e o n e of t h e o p t i o n s o n its o w n initiative.
Since eight is a sufficiently large p a r t y s i z e ,
a n d a s u f f i c i e n t l y p o p u l a r r e s e r v a t i o n time to make the time i n t e r p r e t a t i o n s i g n i f i c a n t l y m o r e
p l a u s i b l e , it makes its o w n choice.
However, an assumption like this cannot b e made w i t h o u t
i n f o r m i n g t h e u s e r , s o the assumption is included in an echo at the start of line f i v e .
Making
t h e e c h o w i l l set u p an e x p e c t a t i o n by the echo correction monitor that the user w i l l c o r r e c t
t h e a s s u m p t i o n ; h o w e v e r , in this case the system struck lucky; the e c h o is not c h a l l e n g e d , a n d
t h e a s s u m p t i o n is t h e r e b y implicitly confirmed.
After
e c h o i n g , the focus maintainer selects an empty slot, the p a r t y s i z e , and s t a r t s
d e s c r i p t i o n i d e n t i f i e r for that slot, which in turn asks for the p a r t y s i z e .
conforms
to
the
unambiguously
expectations
a party
of
the
now
active
description
The user's
identifier
by
the
reply
specifying
size of what the system hears as three, but w h i c h r e a l l y
is
five.
H o w e v e r , the s y s t e m is sufficiently unsure of its perception of the number to mark it d o w n
f o r an e c h o .
At
this
checking
point,
an
dialogue.
unexpected
sound on the line causes
This
satisfies
input
the
ever-present
the
user
to initiate
expectations
of
a
the
channel
channel
m a i n t a i n e r , s o that it becomes active at this point and conducts the dialogue up until t h e OK
of
line
11.
T h e conflict
b e t w e e n the satisfaction of the expectations of b o t h t h e
channel
m a i n t a i n e r , and the d e s c r i p t i o n identifier is resolved by a p r e c e d e n c e o r d e r i n g o n the s e v e r a l
modules; the
pressing
precedence
concerns
is analogous to an interrupt
priority
scheme
in w h i c h
the
r e c e i v e the highest precedence, so that the channel maintainer has
most
the
h i g h e s t p r e c e d e n c e , f o l l o w e d b y the echo modules, with the description identifier h a v i n g t h e
lowest
precedence.
N o w that t h e channel has b e e n re-established, the description identifier can c o n t i n u e
from
w h e r e it left off, s o it e c h o e s the party-size, and then de-activates itself, allowing the f o c u s
m a i n t a i n e r t o take o v e r .
T h e focus maintainer then selects the only slot still to b e f i l l e d , a n d
66
starts
up
the
description
identifier
on that slot, resulting
in the
request
for
H o w e v e r , the user has d e t e c t e d the incorrect echo and issued a c o r r e c t i o n .
the
name.
This does
not
s a t i s f y the e x p e c t a t i o n s of the active description identifier, but does satisfy those of t h e e c h o
c o r r e c t i o n monitor.
T h e monitor corrects the description, and, after r e - e c h o i n g to e n s u r e that
t h e m e s s a g e has b e e n r e c e i v e d correctly this time, terminates, allowing the f o c u s
maintainer
t o r e s t a r t the p r e v i o u s invocation of the description identifier with the n e w d e s c r i p t i o n .
U n f o r t u n a t e l y , the d e s c r i p t i o n is no longer satisfiable, since no table for
a v a i l a b l e at eight o'clock.
five
people
In this situation, the normal response of the d e s c r i p t i o n
is
identifier
w o u l d b e to look for near-misses for the party-size, but it w o u l d clearly b e a mistake
o f f e r t h e u s e r a table for three or four at eight o'clock.
The problem arises b e c a u s e of t h e
i n t e r a c t i o n b e t w e e n the p a r t y - s i z e and the time in determining w h e t h e r
satisfiable.
to
the d e s c r i p t i o n
is
T h e s y s t e m , t h e r e f o r e , provides a method of s p e c i f y i n g w h i c h slots o f a f r a m e
c o n s t r a i n w h i c h o t h e r s , and o f f e r s a mechanism for specifying which to change to r e s o l v e a n y
c o n f l i c t s that a r i s e .
In this case, the time, date, and p a r t y - s i z e all interact to c o n s t r a i n e a c h
other, and changes
are to be attempted in that order.
The net e f f e c t
is that i n s t e a d
i n d i c a t i n g that f i v e is an unsatisfiable description of a p a r t y - s i z e , the d e s c r i p t i o n
of
identifier
f o r p a r t y - s i z e a c c e p t s five as a unique description, terminates, and r e s t a r t s the d e s c r i p t i o n
i d e n t i f i e r f o r time w i t h the previously given description.
T h i s time the normal p r o c e d u r e for dealing with unsatisfiable descriptions a p p l i e s , a n d t h e
s y s t e m lists its near misses.
t h e d e s c r i p t i o n identifier
straightforward
The user's reply in line 14 satisfies o n e of the e x p e c t a t i o n s o f
b y essentially repeating the same specifications.
affirmation that the limitations given are truly c o r r e c t .
T h e r e p l y is
The tendency
for
h u m a n s t o ask f o r confirmation w h e n they hear something that they don't w a n t t o h e a r
very
c o m m o n , and t h e r e
should perhaps
be
a larger
recognition of
that
in a
a
is
gracefully
i n t e r a c t i n g s y s t e m , but w e have not investigated that possibility.
T h e u s e r next accepts the later of the two times, but asks the system to c o n f i r m e x a c t l y
w h i c h time it is.
T h e first part of this input terminates the description identifier f o r t h e time,
a n d t h e s e c o n d part is handled b y the explicit incomprehension monitor, w h i c h g e n e r a t e s t h e
first
part
of
line
17.
The
second
part
is generated
by
a re-initialized
version
d e s c r i p t i o n i d e n t i f i e r for the name in which the reservation is to be made.
of
the
T h e module is
r e - i n i t i a l i z e d b e c a u s e a module cannot be continued from w h e r e it left off if o t h e r m o d u l e s at
the
same
priority
level
have
participated
in
the
intervening
dialogue.
The
ensuing
d e t e r m i n a t i o n of the name shows some of the special problems that can arise w h e n
n a m e s not p r e v i o u s l y k n o w n to the system have to be communicated.
descriptions
properly,
r e c o g n i z e spellings.
a system
needs
a catalogue
of
common
names
proper
To understand
and
an
such
ability
to
67
Finally
points.
o n line 2 1 , the r e s e r v a t i o n is complete and the system reconfirms
A g a i n this is an echo which the user could contradict.
the
essential
In the e v e n t , he is s a t i s f i e d
a n d t h e c o n v e r s a t i o n is finished off by the terminator module.
Conclusion
G r a c e f u l i n t e r a c t i o n is of fundamental importance for all computer computer s y s t e m s that
interact
w i t h humans.
Without graceful interaction skills, interactive computer s y s t e m s
c o n t i n u e to a p p e a r u n c o o p e r a t i v e , uncompromising, and altogether obtuse to the
user.
will
non-expert
Y e t , as w e have s e e n , graceful interaction is a little studied field; some w o r k has b e e n
d o n e o n s o m e of the skills r e q u i r e d , but such efforts have generally b e e n tangential t o t h e
m a i n t h r u s t of the systems in which they w e r e embedded.
No w o r k e r s have a t t e m p t e d t o
s t u d y t h e g r a c e f u l interaction problem as a whole.
In t h i s p a p e r w e have attempted to circumscribe graceful interaction as a f i e l d f o r
To
this e n d , w e
robust
h a v e d e f i n e d a set of basic components of graceful i n t e r a c t i o n ,
study.
including
c o m m u n i c a t i o n , flexible parsing, explanation facilities, focus mechanisms, i d e n t i f i c a t i o n
f r o m d e s c r i p t i o n s , and g e n e r a t i o n of descriptions.
These components are c e r t a i n l y
necessary
f o r a n y k i n d of g r a c e f u l interaction, but are also sufficient, w e claimed, for s y s t e m s r e s t r i c t e d
t o o f f e r i n g o n e of the v e r y large class of simple services.
F i n a l l y , w e b e l i e v e that graceful interaction is an idea whose time has come, and is r i p e f o r
implementation.
W e claim that none of the components w e d e s c r i b e d is significantly
beyond
t h e c u r r e n t s t a t e of the art in dialogue processing (at least not for simple s e r v i c e s y s t e m s ) ,
and
we
have
presented
an
g r a c e f u l l y i n t e r a c t i n g system.
architecture
for
their
integrated
implementation
in
a
single
A gracefully interacting simple service s y s t e m (for s e n d i n g a n d
r e c e i v i n g e l e c t r o n i c mail) conforming to this architecture is c u r r e n t l y under implementation.
Acknowledgements
We
are
particularly
grateful
to Jaime Carbonell
and Candy
Sidner
for
their
insightful
c o m m e n t s o n e a r l i e r v e r s i o n s of this paper.
References
1.
A u s t i n , J . L.
How to Do Things with Words. Oxford University P r e s s , 1962.
2. B o b r o w , D. G., Kaplan, R. M., Kay, M., Norman D, A., Thompson, H., and W i n o g r a d , T .
F r a m e - D r i v e n Dialogue System. Artificial Intelligence 8 (1977), 1 5 5 - 1 7 3 .
GUS: a
63
3.
C a r b o n e l l , J . G. Subjective Understanding: Computer Models of Belief Systems.
Ph.D. T h . ,
Y a l e University, 1979.
4.
C a r b o n e l l , J . R.
Mixed-Initiative Man-Computer Dialogues, tech r e p o r t 1970, Bolt,
B e r a n e k , and N e w m a n , Inc., 1971.
5.
C h a r n i a k , E. C.
T o w a r d a Model of Children's Story Comprehension.
TR-266, MIT AI Lab,
C a m b r i d g e , Mass., 1972.
6.
C o d d , E. F.
S e v e n Steps to RENDEZVOUS with the Casual User.
In Klimbie, J . W. and
K o f f e m a n , K. L., Ed., Proc. IF1P TC-2 Working Conf on Data Base Management Systems, N o r t h
H o l l a n d , A m s t e r d a m , 1974, p p . 179-200,
7.
C o h e n , P. R.
On Knowing What to Say: Planning Speech Acts. Ph.D. Th., Dept. of C o m p u t e r
S c i e n c e , U n i v e r s i t y of T o r o n t o , January 1978. also available as T e c h . Report 1 1 8
8 . C o l b y , K. M. Simulations of Belief Systems. In Schank, R. C. and C o l b y , K. M., Ed.,
Computer Models of Thought and Language, Freeman, San Francisco, 1973, p p . 2 5 1 - 2 8 6 .
9.
D o y l e , J . T r u t h Maintenance Systems for Problem Solving. T R - 4 1 9 , MIT A I Lab.,
C a m b r i d g e , Mass., 1978.
10.
F a h l m a n , S. E.
NETL: A System for Representing and Using Real-World Knowledge. M I T
P r e s s , C a m b r i d g e , Mass., 1979.
11.
F o x , M. S., and M o s t o w , D. J . Maximal Consistent Interpretations of E r r o r f u l Data in
H i e r a r c h i c a l l y M o d e l l e d Domains. Proc. Fifth Int. Jt. Conf. on Artificial Intelligence, M I T , 1 9 7 7 ,
pp. 165-171.
12.
G r i c e , H. P.
Semantics:
13.
Logic and Conversation.
In Cole, P. and Morgan, J . L., Ed., Syntax
and
Speech Acts, Academic Press, New York, 1975.
G r i m e s , J . E.
T o p i c Levels.
Proc. of Theoretical Issues in Natural Language P r o c e s s i n g :
A n I n t e r d i s c i p l i n a r y W o r k s h o p , University of Illinois at Urbana-Champaign, July, 1 9 7 8 , p p .
104-108.
14. G r o s z , B. J . T h e R e p r e s e n t a t i o n and Use of Focus in a System for U n d e r s t a n d i n g
D i a l o g u e s . P r o c . F i f t h Int. Jt. Conf. on Artificial Intelligence, MIT, 1977, p p . 6 7 - 7 6 .
15.
G r o s z , B. J . F o c u s i n g in Dialogue. Proc. of Theoretical Issues in Natural L a n g u a g e
P r o c e s s i n g : A n Interdisciplinary Workshop, University of Illinois at U r b a n a - C h a m p a i g n ,
July, 1978.
16.
H a y e s , P. J . On Semantic Nets, Frames, and Associations.
Proc. Fifth Int. Jt. Conf. o n
A r t i f i c i a l Intelligence, MIT, 1977, p p . 9 9 - 1 0 7 .
17.
H a y e s , P. J . A R e p r e s e n t a t i o n for Robot Plans. Proc. Fourth Int. Jt. Conf. o n A r t i f i c i a l
I n t e l l i g e n c e , T b i l i s i , G e o r g i a , 1975, pp. 181-188.
18.
H a y e s - R o t h , F., Erman, L. D., Fox, M., and Mostow, D. J . Syntactic P r o c e s s i n g in
HEARSAY-II.
S p e e c h Understanding Systems. Summary of Results of the F i v e - Y e a r
E f f o r t at C a r n e g i e - M e l l o n University, Carnegie-Mellon University Computer S c i e n c e
Department, 1976.
Research
69
1 9 . H a y e s - R o t h , F., Gill, G., and M o s t o w , D. J. Discourse and Task P e r f o r m a n c e in H E A R S A Y - I I .
S p e e c h U n d e r s t a n d i n g Systems. Summary of Results of the F i v e - Y e a r R e s e a r c h E f f o r t at
C a r n e g i e - M e l l o n U n i v e r s i t y , C a r n e g i e - M e l l o n University Computer Science Department, 1 9 7 6 .
2 0 . H e n d r i x , G. G. Human Engineering for A p p l i e d Natural Language P r o c e s s i n g
Int. J t . C o n f . o n A r t i f i c i a l Intelligence, MIT, 1977, pp. 1 8 3 - 1 9 1 .
/
F
2 1 . H e n d r i x , G. G. E x p a n d i n g the.Utility of Semantic N e t w o r k s t h r o u g h Par â„¢ " '
F o u r t h Int. Jt. C o n f . o n Artificial Intelligence, Tbilisi, Georgia, 1975, p p .
lir
g
22.
H o c k e t t , C. F.
23.
K a p l a n , S. J . Cooperative
System.
A Course in Modern Linguistics. MacMillan, New
Yor
f
t
h
°°'
'
t 9 5 9
Responses from a Portable Natural langu' , £
D
Ph.D. Th., Dept. of Computer and Information Science, U n i v e r s i f
P r
,
ta
P
Ba5
e
n
n
s
.
C
y
Q
,
v
a
n
u
B
,
a
r
y
'
Philadelphia, 1979.
24.
L e h n e r t , W.
The Process of Question Angering.
25.
L e v i n , J.*A., a n d M o o r e , J . A.
L a n g u a g e Understanding.
26.
l?»>- >~ ~ '
'> Hillside, N. 1, 1 9 7 8 .
Dialfue Games: Meta-Communtation S t r u c t u r e s for N a t u r a l
Cognitive uence I, 4 (1977), 3 9 5 - 4 2 0 .
L e v i n , J . A. a n d Goldman N. M P r o c e s s Models of r,L/ere?7;e in Context.
1SI/RR-78-72,
I n f o r m a t i o n S c i e n c e s institute, U^. of Southern California, October, 1978.
27.
M a n n , W. C , M o o r e d A., and Levin, J . A. A Comprehension M o d e l for Human D i a l o g u e .
P r o c . F i f t h Int. Jt. Conf.
28.
M c K e o w n , K.
System.
29.
Artificial Intelligence, MIT, 1977, pp. 7 7 - 8 7 .
P a r a p l r a s i n g Using Given and New Information in a Q u e s t i o n - A n s w e r i n g
A s s o c i a t i o n forComputationai Linguistics Meeting, San Diego, 1 9 7 9 .
M i n s k y , M.
Psychology
30.
n
A Framework for Representing Knowledge.
In W i n s t o n , P., Ed., The
of Computer Vision, M c G r a w Hill, 1975, pp. 2 1 1 - 2 7 7 .
P a r k i s o n , R. C , C o l t y . K. M., and Faught, W. S. Conversational Language C o m p r e h e n s i o n
U s i n g I n t e g r a t e d P a t t e r n - M a t c h i n g and Parsing. Artificial Intelligence 9 (1977), 1 1 1 - 1 3 4 .
31.
R a p h a e l , B.
L., E d . , Semantic
SIR: A Computer Program for Semantic Information R e t r i e v a l .
Information
In M i n s k y , M .
Processing, MIT P r e s s , Cambridge, Mass., 1968, p p . 3 3 - 1 3 4 .
3 2 . R i e s b e c k , C. K. C o m p u t a t i o n ^ Understanding: Analysis of S e n t e n c e s and C o n t e x t .
W o r k i n g P a p e r 4, Istituto p e r gli Studi Semantici e Cognitivi, Castagnola, S w i t z e r l a n d , 1 9 7 4 .
33.
S a c e r d o t i , E. D.
L a n g u a g e Access , f
0
Distributed Data w i t h E r r o r R e c o v e r y .
Proc. Fifth
Int. J t . C o n f . o n / ' t i f i c i a l Intelligence, tv^T, 1977, pp. 1 9 6 - 2 0 2 .
34.
S a c e r d o t i , E. t
2 U97'
3
5
-
197
3
6
-
4 ) i
Planiyjn* <r l . ^
n
r a
r r h y o f Abstraction Spaces. Artificial
Intelligence
S,
115-l^E.
' S c r a g g , G. W .
Answering Questions about Processes. Ph.D. Th., U. of C a l i f o r n i a , S a n D i e g o ,
4 #
S e a r l e , J . R.
Speech Acts. Cambridge University P r e s s , 1969.
70
3 7 . S i d n e r , C, A P r o g r e s s Report o n the Discourse and Reference Components of P A L .
M e m o . 4 6 8 , MIT A. I. Lab., 1978.
A. I.
3 8 . S u s s m a n , G. J . and McDermott, D. V. From PLANNER to CONNIVER - a Genetic A p p r o a c h .
P r o c . Fall Joint C o m p u t e r Conf., AFIPS Press, Montvale, N.J., 1972, pp. 1 1 7 1 - 1 1 7 9 .
3 9 . W a l k e r , D. S p e e c h Understanding Research, Final Report, Project 4 7 6 2 .
I n t e l l i g e n c e C e n t e r , S t a n f o r d Research Institute, Menlo Park, Ca., 1976.
Artificial
4 0 . W a l t z , D. L. A n English Language Question A n s w e r i n g System for a L a r g e Relational Data
B a s e . Comm. ACM 21, 7 (1978), 5 2 6 - 5 3 9 .
4 1 . W e i s c h e d e l , R. M. and Black, J . Responding to Potentially Unparseable S e n t e n c e s , t e c h .
r e p o r t 7 9 / 3 , Dept. of Computer and Information Sciences, University of D e l a w a r e , 1 9 7 9 .
4 2 . W e i z e n b a u m , J . ELIZA - A Computer Program for the Study of Natural Language
C o m m u n i c a t i o n b e t w e e n M a n and Machine. Comm. ACM 9, 1 (January 1966), 3 6 - 4 5 .
4 3 . W i l k s , Y. A. P r e f e r e n c e Semantics. In Keenan, Ed., Formal Semantics of
L a n g u a g e , C a m b r i d g e University Press, 1975.
44.
W i n o g r a d , T.
45.
W o o d s , W. A.
Understanding
Natural
Natural Language. Academic P r e s s , New Y o r k , 1 9 7 2 .
T r a n s i t i o n Network Grammars for Natural Language A n a l y s i s .
Comm.
ACM
1 3 , 1 0 ( O c t o b e r 1970), 5 9 1 - 6 0 6 .
46.
W o o d s , W. A., Kaplan, R. M., and Nash-Webber, B. The Lunar Sciences Language S y s t e m :
Final Report.
2 3 7 8 , Bolt, Beranek, and Newman, Inc., 1972.
4 7 . W o o d s , W. A., Bates, M., B r o w n , G., Bruce, B., Cook, C , Klovstad, J., Makhoul, J.,
N a s h - W e b b e r , B., S c h w a r t z , R., Wolf, J., and Zue, V. Speech Understanding Systems - F i n a l
T e c h n i c a l R e p o r t . 3 4 3 8 , Bolt, Beranek, and Newman, Inc., 1976.