Carnegie Mellon University Research Showcase @ CMU Computer Science Department School of Computer Science 1979 An anatomy of graceful interaction in spoken and written man-machine communication Philip J. Hayes Carnegie Mellon University Dabbala Rajagopal Reddy Follow this and additional works at: http://repository.cmu.edu/compsci This Technical Report is brought to you for free and open access by the School of Computer Science at Research Showcase @ CMU. It has been accepted for inclusion in Computer Science Department by an authorized administrator of Research Showcase @ CMU. For more information, please contact [email protected]. NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making o f photocopies or other reproductions of copyrighted material. Any copying of this document without permission of its author may be prohibited by law. CMU-CS-79-144 An Anatomy of Graceful Interaction in Spoken and Written Man-Machine Communication Phil Hayes Raj Reddy 13 September 1979 Computer Science Department Carnegie-Mellon University Pittsburgh, P A 15213 This ARPA research Order w a s s p o n s o r e d by the Defense Advanced Research P r o j e c t s No, 3 5 9 7 , monitored b y the Air Force Avionics Laboratory Agency Under (DOD), Contract F33615-78-C-1551. T h e v i e w s a n d conclusions contained in this document are those of the authors a n d s h o u l d n o t b e i n t e r p r e t e d as r e p r e s e n t i n g the official policies, either e x p r e s s e d o r implied, of D e f e n s e A d v a n c e d R e s e a r c h P r o j e c t s Agency or the US Government. UNIVERSITY LIBRARIES CARNEGIE-MELLON UNtVERSITT PITTSBURGH. PENNSYLVANIA 15211 the Abstract T h e r e h a v e r e c e n t l y b e e n a number of attempts to provide natural and flexible interfaces t o c o m p u t e r s y s t e m s t h r o u g h the medium of natural language. While such i n t e r f a c e s t y p i c a l l y p e r f o r m w e l l in r e s p o n s e to straightforward requests and questions w i t h i n their d o m a i n o f discourse, they circumstances. often fail to interact gracefully with Most c u r r e n t systems cannot, for instance: their users in less predictable r e s p o n d r e a s o n a b l y to input not c o n f o r m i n g to a r i g i d grammar; ask for and understand clarification if their u s e r ' s input u n c l e a r ; o f f e r c l a r i f i c a t i o n of their o w n output if the user asks for it; or interact t o is resolve a n y a m b i g u i t i e s that may arise w h e n the user attempts to describe things to the s y s t e m . We arise b e l i e v e that g r a c e f u l interaction in these and the many other c o n t i n g e n c i e s that can i n human c o n v e r s a t i o n is essential if interfaces are e v e r to appear c o o p e r a t i v e and h e l p f u l , and h e n c e e x p e r i e n c e d user. be suitable for the casual or naive user, and more habitable for the In this paper; w e attempt to circumscribe graceful interaction as a f i e l d f o r s t u d y , a n d i d e n t i f y the problems involved in achieving it. To skills: this e n d w e d e c o m p o s e graceful interaction into a number of r e l a t i v e l y independent skills i n v o l v e d in parsing elliptical, fragmented, and otherwise ungrammatical input; in e n s u r i n g r o b u s t communication; in explaining abilities and limitations, actions and t h e m o t i v e s b e h i n d them; in k e e p i n g track of the focus of attention of a dialogue; in i d e n t i f y i n g things from terms descriptions, even if ambiguous a p p r o p r i a t e f o r the context. interaction domains. and w e and sufficient for or unsatisfiable; and in d e s c r i b i n g things in We claim these skills are necessary for any t y p e of graceful interaction in a certain large class of graceful application N o n e of these components is individually much b e y o n d the current state of t h e art, outline the architecture of a system that integrates them all. Thus, w e propose g r a c e f u l i n t e r a c t i o n as an idea of great practical utility whose time has come and w h i c h r i p e f o r implementation. We are currently implementing a gracefully interacting s y s t e m a l o n g t h e l i n e s p r e s e n t e d ; the s y s t e m will initially deal with t y p e d input, but is e v e n t u a l l y t o accept natural speech. is intended Table of Contents Introduction 1. C o m p o n e n t s of Graceful Interaction 2. R o b u s t Communication 2.1. I n t r o d u c t i o n - The Principle of Implicit Confirmation 2.2. Implicit Acknowledgements 2.3. Explicit Indications of Incomprehension 2.4. E c h o i n g and the Use of Fragmentary Recognition 2.5. Summary of Robust Communication 3. F l e x i b l e P a r s i n g 3.1. Grammatical Deviations 3.2. Implications for Language Analysers 3.3. Summary of Flexible Parsing 4. Domain K n o w l e d g e 4.1. T h e Role of Domain Knowledge 4.2. Simple S e r v i c e s 4.3. R e p r e s e n t a t i o n s for Simple Service Domains 5. E x p l a n a t i o n Facility 5.1. Explanation T y p e s 5.2. Q u e s t i o n s A b o u t Ability - Indirect Speech Acts 5.3. Q u e s t i o n s About Ability in a-Restricted Domain 5.4. 5.5. 5.6. 5.7. Event Questions H y p o t h e t i c a l Questions A b i l i t y M o d e l s as a Safety Net Summary of Explanation Facilities 6. G o a l s and F o c u s 6.1. Goals 6.2. S u b g o a l s In Simple Service Domains 6.3. F o c u s 6.4. K e e p i n g track of focus 6.5. Summary of Goals and Focus 7. I d e n t i f i c a t i o n from Descriptions 7.1. T h e Identification Problem 7.2. A m b i g u o u s Descriptions 7.3. Unsatisfiable Descriptions 7.4. D e s c r i p t i o n s and Faulty Comprehension 7.5. Summary of Identification from Descriptions 8. Language Generation 9. R e a l i z i n g G r a c e f u l Interaction 9.1. A r c h i t e c t u r e of a Gracefully Interacting System 9.2. W o r k e d Example Conclusion 1 Introduction A great deal of interest has recently been shown in providing natural and flexible c o m p u t e r i n t e r f a c e s t h r o u g h systems capable of engaging in a dialogue w i t h a human b e i n g in m o r e o r l e s s natural language. This interest is embodied in numerous systems i n c l u d i n g G U S [ 2 ] , L I F E R [ 2 0 ] , P A L [37], P A R R Y [30], and PLANES [40], and in the w o r k of C o d d [6], G r o s z [14], and others. Such systems typically respond accurately and appropriately to s t r a i g h t f o r w a r d r e q u e s t s and questions or otherwise uneventful dialogue w i t h i n t h e i r d o m a i n of discourse. Compared language interfaces instance: respond to appears human beings, however, the performance quite reasonably rigid to input and fragile. Most not conforming current of current systems natural cannot, to a rigid grammar; ask for for and u n d e r s t a n d c l a r i f i c a t i o n if their user's input is unclear; o f f e r clarification of their o w n o u t p u t if t h e u s e r asks f o r it; o r interact to resolve any ambiguities that may arise w h e n t h e u s e r attempts to describe things naturally and gracefully to the system. A dialogue system cannot hope to interact with its users, unless it can cope w i t h these and the many other c o n t i n g e n c i e s , s o common in o r d i n a r y human conversation. G r a c e f u l i n t e r a c t i o n , then, involves dealing appropriately w i t h anything a u s e r h a p p e n s t o say, rather t h a n just those inputs in the mainstream of a conversation. Even though little a t t e n t i o n has b e e n paid to it in w o r k to date (PARRY [30] being the main e x c e p t i o n ) , b e l i e v e that g r a c e f u l interaction is essential in making natural language i n t e r f a c e s (for typed we both a n d s p o k e n input) appear cooperative and helpful to their users, and thus in making c o m p u t e r s y s t e m s more accessible to casual or naive users, and more pleasant and g e n e r a l l y h a b i t a b l e f o r t h e more e x p e r i e n c e d user. In this p a p e r , w e claim that the ability to interact gracefully d e p e n d s o n a n u m b e r relatively i n d e p e n d e n t skills: of skills involved in parsing elliptical, fragmented, and o t h e r w i s e u n g r a m m a t i c a l input; in e n s u r i n g robust communication; in explaining abilities and limitations, a c t i o n s a n d the motives b e h i n d them; in keeping track of the focus of attention of a d i a l o g u e ; in i d e n t i f y i n g things from descriptions, e v e n if ambiguous or unsatisfiable; and in d e s c r i b i n g things in t e r m s a p p r o p r i a t e for the context. While none of these components of graceful i n t e r a c t i o n has b e e n e n t i r e l y neglected in the literature, no single current s y s t e m c o m e s c l o s e t o h a v i n g most of the abilities and behaviours w e describe, and many are not p o s s e s s e d any current systems. This will become clear in our discussion of each component. b e l i e v e , h o w e v e r , that none of these components is individually much b e y o n d the by We current s t a t e of t h e art, and later in the paper w e will present the architecture of a s y s t e m that c o m b i n e s them all. T h u s , w e b e l i e v e that g r a c e f u l interaction is an idea of great practical utility w h o s e time h a s c o m e , and w h i c h is r i p e for implementation. g r a c e f u l i n t e r a c t i o n as a field of study, to \dent\fy This paper is an attempt to circumscribe the problems that n e e d to b e s o l v e d , a n d t o p r e s e n t t h e d e s i g n of a s y s t e m that, for a suitably r e s t r i c t e d domain, is c a p a b l e of g r a c e f u l interaction. truly W e are c u r r e n t l y implementing a gracefully interacting s y s t e m a l o n g t h e l i n e s p r e s e n t e d ; t h e s y s t e m will initially deal w i t h t y p e d input, but is e v e n t u a l l y i n t e n d e d to accept natural speech. 1. Components of Graceful Interaction G r a c e f u l i n t e r a c t i o n is not a single monolithic skill. n u m b e r o f d i v e r s e abilities and b e h a v i o u r s . Rather, it seems to b e c o m p o s e d of a In the succeeding sections, w e w i l l d e s c r i b e a s e t o f a b i l i t i e s a n d b e h a v i o u r s that appear to b e essential for graceful i n t e r a c t i o n . Although this s e t c o n t a i n s n e c e s s a r y c o m p o n e n t s of graceful interaction, it is p r o b a b l y not s u f f i c i e n t g r a c e f u l i n t e r a c t i o n i n g e n e r a l ; h o w e v e r , w e believe it p r o v i d e s a g o o d w o r k i n g b a s i s for from w h i c h t o b u i l d g r a c e f u l l y i n t e r a c t i n g systems, and in particular, those w h i c h p r o v i d e a s i m p l e service i n a r e s t r i c t e d domain (e.g. telephone d i r e c t o r y assistance, r e s t a u r a n t reservations, c o m p u t e r mail s e r v i c e s ) ; w e will define this class a little more c a r e f u l l y in S e c t i o n 4.2. T h e c o m p o n e n t s of g r a c e f u l interaction w e describe are b a s e d o n p h e n o m e n a o b s e r v a b l e i n n a t u r a l l y ' occuring human dialogues. We believe it is very important for a gracefully i n t e r a c t i n g s y s t e m t o conduct a dialogue in as human-like a w a y as p o s s i b l e ; if t h e s t r a t e g i e s the system employs for clarifying its incomprehensions d e s c r i p t i o n s s u p p l i e d b y the user, etc. sam.? s i t u a t i o n , t h e n the user graceful. of the user, r e s o l v i n g ambiguous are not the same as those a human w o u l d u s e i n t h e will feel that the interaction is not natural, a n d h e n c e not F u r t h e r m o r e , most of the components of graceful interaction w e h a v e g l e a n e d f r o m h u m a n c o n v e r s a t i o n s i n v o l v e cooperation b e t w e e n speaker and listener; if the u s e r t r i e s to e m p l o y a n y of t h e s e t e c h n i q u e s in his interaction w i t h the system, an inability of t h e s y s t e m t o c o o p e r a t e w i l l a p p e a r t o him most ungraceful. Thus the f r e e d o m w e w i s h t o g i v e t h e u s e r t o e x p r e s s himself e x a c t l y as he w i s h e s requires that a gracefully i n t e r a c t i n g s y s t e m b e a b l e t o d e a l w i t h d i a l o g u e p r o b l e m s in much the same w a y as a human d o e s . j U n f o r t u n a t e l y , t h e aim of b e i n g as human-like as possible must be t e m p e r e d b y t h e l i m i t e d X^* p o t e n t i a l f o r c o m p r e h e n s i o n of any f o r s e e a b l e computer system. Until a s o l u t i o n is f o u n d t o t h e p r o b l e m s of o r g a n i z i n g and using the range of w o r l d k n o w l e d g e p o s s e s s e d b y a h u m a n , p r a c t i c a l s y s t e m s w i l l o n l y b e able to comprehend a small amount of input, t y p i c a l l y w i t h i n a s p e c i f i c d o m a i n of e x p e r t i s e . Graceful interaction must, t h e r e f o r e , supplement its s i m u l a t i o n o f h u m a n c o n v e r s a t i o n a l ability w i t h strategies to deal naturally and g r a c e f u l l y w i t h i n p u t t h a t is n o t f u l l y u n d e r s t o o d , and, if possible, to steer a conversation back t o the s y s t e m ' s ground. home 3 W e p r o p o s e s e v e n components of graceful interaction. nor are the boundaries b e t w e e n the several components completely w a t e r t i g h t ; a r g u m e n t s c o u l d b e made f o r d r a w i n g them in different places. any of This set is p r o b a b l y not s u f f i c i e n t , However, it w o u l d be h a r d to a r g u e that t h e s e v e n w e r e unimportant for graceful interaction. T h e components w e propose are: - Robust communication: The set of strategies needed to e n s u r e that r e c e i v e s a s p e a k e r s utterance, and interprets it correctly. a listener - Flexible parsing: The ability to deal with naturally used natural language, w i t h all t h e e l l i p s e s , idioms, grammatical e r r o r s , and fragmentary u t t e r a n c e s it c a n contain. - Domain knowledge: Not strictly-speaking a component of graceful i n t e r a c t i o n , but a p r e r e q u i s i t e f o r all the other components. - Explanation facility: The ability of the system to explain what it can and c a n n o t d o , w h a t it has d o n e , what it is trying to do, and why, b o t h for r e s p o n s e to d i r e c t q u e s t i o n s , and as a fall-back w h e n communication breaks d o w n . - Focus mechanisms: T h e ability to keep track of what the c o n v e r s a t i o n is about, as t h e items u n d e r discussion shift; this is important for the r e s o l u t i o n of e l l i p s i s a n d a n a p h o r a , as w e l l as for continuity in the conversation. - Identification from descriptions: The ability to recognize an o b j e c t f r o m a d e s c r i p t i o n ; this involves the ability to pursue a clarifying dialogue if the o r i g i n a l d e s c r i p t i o n is not clear. - Generation of descriptions: The ability to generate d e s c r i p t i o n s that are a p p r o p r i a t e f o r the context, and that satisfy requirements imposed b y the o t h e r c o m p o n e n t s , e s p e c i a l l y robust communication. W h i l e n o n e of t h e s e components of graceful interaction have b e e n e n t i r e l y n e g l e c t e d in t h e l i t e r a t u r e , n o s i n g l e c u r r e n t system comes close to having most of the abilities a n d b e h a v i o u r s w e d e s c r i b e , and many are not possessed by any current systems. In w h a t f o l l o w s , w e will consider each component in detail; in particular, in e a c h c a s e we will d i s c u s s : - t h e n a t u r a l language and dialogue phenomena on which the component is b a s e d ; - t h e abilities and b e h a v i o u r s needed by a computer dialogue system to p a r t i c i p a t e g r a c e f u l l y in a c o n v e r s a t i o n containing these phenomena (including t h o s e d i f f e r i n g f r o m humans, but necessitated by those limitations of c u r r e n t d i a l o g u e s y s t e m s that are unlikely to be removed in the near future); - t h e e x t e n t to w h i c h any current systems contain the component (including any r e a s o n s w h y their basic structure or methods might preclude their i n c o r p o r a t i o n 4 of t h e component, and the implications of this for the design of systems which t r y t o implement the component). In t h e s e d i s c u s s i o n s , w e will in general make no distinction b e t w e e n s p o k e n and typed l a n g u a g e ; this r e f l e c t s our v i e w that most of the principles of graceful i n t e r a c t i o n a r e s a m e w h e t h e r the medium of communication is speech or text. the In cases w h e r e it d o e s make a d i f f e r e n c e , mainly those involving problems with speech that do not arise for t y p e d i n p u t , w e w i l l n o t e t h e d i f f e r e n c e explicitly. The discussions will be illustrated b y example c o n v e r s a t i o n f r a g m e n t s ; t h e s e c o n v e r s a t i o n s are intended to be typical of those that might o c c u r between a g r a c e f u l l y i n t e r a c t i n g system and its user; accordingly, w e have set them in simple s e r v i c e d o m a i n s of the t y p e mentioned above (see Section 4.2 for a more precise d e f i n i t i o n of s i m p l e service domains). internal directory In fact, w e have used two domains of discourse in our e x a m p l e s : assistance system, and a restaurant reservation system. The an examples c h o s e n a r e b a s e d in part o n protocols of actual conversations (with t w o human p a r t i c i p a n t s ) , a n d i n p a r t o n c o n v e r s a t i o n s invented to illustrate particiJIar points. 2. Robust Communication D u r i n g t h e c o u r s e of a conversation, it is not uncommon for people to m i s u n d e r s t a n d o r fail to understand each other. Such failures in communication do not usually cause the c o n v e r s a t i o n t o b r e a k d o w n ; rather, the participants are able to resolve the d i f f i c u l t y , u s u a l l y b y a s h o r t c l a r i f y i n g sub-dialogue, and continue with the conversation from w h e r e t h e y off. Current computer systems are unable to take part r e s o l v e communication difficulties in any other way. left in such c l a r i f y i n g d i a l o g u e s , or As a result, w h e n such d i f f i c u l t i e s o c c u r , a c o m p u t e r d i a l o g u e system is unable to keep up its end of the c o n v e r s a t i o n , and a c o m p l e t e breakdown is likely to result; this fragility lies in stark and unfavourable c o n t r a s t to the r o b u s t n e s s of human dialogue. In this section, we discuss the cooperative techniques that humans use to overcome d i f f i c u l t i e s in communication, and the w a y s in which these techniques can b e made a v a i l a b l e t o c o m p u t e r d i a l o g u e systems. 2.1. Introduction - The Principle of Implicit Confirmation E v e r y time a human b e i n g speaks to another in an attempt to communicate s o m e t h i n g , f a c e s t h r e e o b s t a c l e s to getting his message across: - t h e l i s t e n e r may not r e c e i v e the message; - t h e l i s t e n e r may r e c e i v e the message but be unable to interpret all or p-art of it; 5 - t h e l i s t e n e r may i n t e r p r e t the message incorrectly. A l t h o u g h t h e s e difficulties do not occur in the majority of attempts at communication, t h e y a r e n o t u n c o m m o n in o r d i n a r y human conversation and arise completely u n p r e d i c t a b l y . them does convey o c c u r , the listener with potentially unpredictable always errors manage elimination or to in communication, human dialogue is extremely robust; p e o p l e almost message across. for This the conversation. to these their consequences intended Despite get serious robustness does not stem from the minimization of such e r r o r s , but rather from a set of t e c h n i q u e s , b a s e d cooperation b e t w e e n speaker correct errors the will not receive the message that the s p e a k e r If o n e o f that and listener, that humans use to detect, r e c o v e r occur. It is these techniques and their application i n t e r a c t i o n that w e will be considering in this and the following subsections. on from, and to graceful The present s u b s e c t i o n , in particular, discusses a convention, tacitly agreed by participants in a h u m a n d i a l o g u e , o n w h i c h all the techniques are based. S i n c e a s p e a k e r t y p i c a l l y has no direct way of telling whether his listener has r e c e i v e d his m e s s a g e c o r r e c t l y , d e t e c t i o n of communication difficulties must rest o n some c o n v e n t i o n of a c k n o w l e d g e m e n t , commonly agreed upon by speaker and listener. One p o s s i b l e c o n v e n t i o n is speaker for the listener essentially the to explicitly technique acknowledge everything that the says. e m p l o y e d for communication b e t w e e n n e t w o r k s of w h i c h a n a l o g o u s communication problems can arise. This is computers in While this convention e n s u r e s extremely a c c u r a t e communication, constant explicit acknowledgement would be too tedious f o r h u m a n s . Instead, humans occaisonally use allowing a convention inaccurate which requires communication. much We less overhead call this c o n v e n t i o n at the the cost of principle of implicit c o n f i r m a t i o n , and can state it as follows: T h e s p e a k e r assumes his message has been received by the listener, a n d r e c e i v e d c o r r e c t l y , unless the listener indicates o t h e r w i s e . T h i s a s s u m p t i o n of communication on the speaker's part is analogous to the a s s u m p t i o n o n t h e s p e a k e r ' s part that the listener will follow any shift in focus he may make, as n o t e d by G r o s z [15]. While this approach is obviously very efficient when c o m m u n i c a t i o n , it places a large burden on the listener. that he has conversation comedies techniques not will exploit received continue the message correctly, the there is no to good effect. This f o r r o b u s t communication can sometimes fail. with Unless the listener c a n d e t e r m i n e error will go u n c o r r e c t e d and may become quite c o n f u s e d . s u c h confusions difficulty is the basic undetected; Marx reason brothers* that Fortunately, the l i s t e n e r the human normally h a s e n o u g h e x p e c t a t i o n s about the sorts of things the speaker might say to make this a r a r e 6 occurrence. In o r d e r f o r its interaction to appear natural, a gracefully interacting s y s t e m must u s e t h i s s a m e p r i n c i p l e of implicit confirmation in its conversation with its user. Given the n e c e s s a r i l y l i m i t e d linguistic abilities of current systems, a gracefully interacting system is, in fact, l i a b l e t o r u n i n t o more communication e r r o r s in the form of miscomprehension and i n c o m p r e h e n s i o n than is a human, acknowledgement) Furthermore, the and if the to deal user will system with these naturally tries to use e r r o r s , it will employ implicit another seem convention most (such ungraceful as to confirmation, and unless explicit the the user. system u n d e r s t a n d s this c o n v e n t i o n and cooperates in it, inaccurate communication, and c o n s i d e r a b l e f r u s t r a t i o n o n the part of the user is liable to result. As far as w e know, no c u r r e n t d i a l o g u e s y s t e m c o n d u c t s its dialogues in accordance with this principle of implicit c o n f i r m a t i o n . A l t h o u g h w e h a v e b e e n using the terms, "speaker" and "listener", e v e r y t h i n g w e h a v e s a i d a p p l i e s e q u a l l y to dialogue t y p e d at a terminal. T y p e d dialogues are just as s u s c e p t i b l e incomprehension and miscomprehension as spoken dialogues; the only d i f f e r e n c e sources problems of such misrecognized in communication. In t y p e d dialogues, w o r d s are as t h e y can be with speech; on the other hand, one cannot e r r o r s in s p o k e n dialogue. is in to the not usually make spelling In what follows, we will continue to blur the distinction between t h e t w o f o r m s of dialogue; our discussion will be ostensibly of s p o k e n dialogue, but all t h e p o i n t s w e make will a p p l y equally to t y p e d dialogue unless o t h e r w i s e noted. In t h e next s u b s e c t i o n , w e will discuss details of how the speaker can tell if his m e s s a g e f a i l s t o b e r e c e i v e d at all; the t w o subsequent subsections discuss t w o simple m e t h o d s by which his the listener comprehension. can indicate his lack of comprehension or lack of confidence In all t h r e e cases w e will comment on the extent to w h i c h the in techniques d i s c u s s e d a r e u s e d in current dialogue systems, and any modifications that are likely t o required in a practical gracefully interacting system, particularly those required by be the l i m i t a t i o n s of the c u r r e n t state of the art in language understanding. The final s u b s e c t i o n is a summary. 2.2. Implicit Acknowledgements In t h i s s e c t i o n w e consider how the speaker can tell if his message w a s r e c e i v e d at all, w h e t h e r c o r r e c t l y or incorrectly, and what he can do if it w a s not r e c e i v e d . As the principle of implicit confirmation states, the speaker r e c e i v e d u n l e s s the listener indicates otherwise. assumes his m e s s a g e was How can the listener indicate that he h a s n o t r e c e i v e d a message w h i c h w e are assuming he does not know has b e e n sent? Obviously, h e c a n n o t d o this actively or explicitly; instead, he does it tacitly b y not d o i n g a n y t h i n g i.e. b y 7 not r e p l y i n g . T o put it another way, the speaker expects a reply, and if he d o e s not r e c e i v e o n e w i l l assume that his message has not been received at all. This e x p e c t a t i o n p r o v i d e s a w a y c o n s i s t e n t w i t h the principle of implicit confirmation of detecting communication e r r o r s in w h i c h t h e l i s t e n e r fails to r e c e i v e any of the speaker's message. In g e n e r a l human dialogue, replies need not be linguistic; for instance, the l i s t e n e r s i m p l y p e r f o r m an action that the speaker requested. interaction, in which conversations are either However, for the p u r p o s e s of g r a c e f u l typed or spoken without any visual c o m m u n i c a t i o n (as o v e r a telephone), we may confine ourselves to p u r e l y linguistic The might replies. r e p l y c a n come either w h e n the speaker has finished delivering his message a n d paused to interruption await a before reply (typed input systems operate the e n d of the message. exclusively this way) or by If no reply is forthcoming, the s p e a k e r has an will a s s u m e that his message has not b e e n received. O n e o p t i o n w i t h a message that has not b e e n received is to repeat it; the more usual a n d , if r e p e t i t i o n fails, e v e n t u a l l y n e c e s s a r y strategy is to assume that t h e r e is some fault in t h e c h a n n e l of communication, and to try either to re-establish communication or to c o n f i r m that t h e c h a n n e l is d e f u n c t . might be a shouted W h e n the speaker is in the presence of the listener, this "can y o u hear me"; the listener might be asleep or deaf. attempt Over the t e l e p h o n e , a much more common situation, the attempt uses phrases such as "Hello!", " A r e y o u t h e r e ? " , "I c a n hear y o u , can y o u hear me?". If the original speaker r e c e i v e s s u i t a b l e replies t o t h e c h a n n e l c h e c k i n g utterances, the substantive part of the dialogue can p r o c e e d . If t h e c h a n n e l is i n d e e d d e f u n c t , the dialogue is at an end. Similar but more conventional f o r m s a r e u s e d at the b e g i n n i n g of a dialogue to ensure that the channel is indeed o p e n and a v a i l a b l e for communication. W e h a v e s a i d that a r e p l y to a message is enough to convince the s p e a k e r of the m e s s a g e t h a t t h e m e s s a g e w a s indeed received. or will any reply do? Clearly Are there any constraints o n the f o r m of t h e r e p l y , any reply of the channel-checking f o r m will not confirm c o m m u n i c a t i o n , but will actively indicate failure of the message to get across: S: T h e n u m b e r for Joe Smith is 5 2 6 7 U: H e l l o ! A r e y o u there? S: Y e s ! C a n y o u hear me? T h e a p p r o p r i a t e r e s p o n s e is to participate in a channel checking dialogue w i t h the user. Apart imply f r o m this case, t h e r e are v e r y f e w other responses w h i c h w o u l d not b e t a k e n reception (correct or incorrect) of the message. It is not n e c e s s a r y to o b t a i n to an a n s w e r t o a q u e s t i o n ; counter questions and e v e n quite distantly related statements s e r v e as replies: 8 U: W h a t is the number for Joe Smith? S: Do y o u k n o w the address? U: Sorry! I mean Bill Smith. If t h e l i s t e n e r g i v e s a r e p l y that appears completely inappropriate to the original s p e a k e r , h e is m o r e l i k e l y to assume that the listener received the original message i n c o r r e c t l y , rather t h a n n o t at all. A g r a c e f u l l y interacting system should, of course, conform to the human c o n v e n t i o n s about implicit a c k n o w l e d g e m e n t s , i.e. it should reply to all of its user's utterances w i t h i n a s h o r t period (possibly giving a time filling reply or request to "hold o n " if it cannot give a s u b s t a n t i v e r e p l y s o o n enough), and it should expect its user to r e p l y to it in a similarly s h o r t time. If this expectation is not met, it should be able to initiate c h a n n e l - c h e c k i n g dialogue, and if possible, return to its original utterance. and complete a Of c o u r s e , it must a l s o b e a b l e to s u s p e n d this kind of "time-out" expectation if the user asks it t o w a i t , a n d n a t u r a l l y t h e r e is no n e e d to limit the "patience" of such a system v e r y tightly. T h e i m p o r t a n c e of r e p l y i n g to the user within a short time of his input has b e e n s t r e s s e d i n t h e L I F E R s y s t e m [20]. T h e immediate replies, however, took the form of p r o g r e s s r e p o r t s o n t h e p r o c e s s i n g of the input, and essentially s e r v e d as time-fillers. D e s c r i p t i o n s of other s y s t e m s s u c h as P A R R Y [30] and PLANES [40] have stressed the need for r a p i d r e p l i e s , b u t these systems time-filling do reply not attempt to monitor the s p e e d of their o w n r e p l i e s if it is too slow. One version h o w e v e r , a p o l o g i z e if it took too long to respond. of the SCHOLAR and p r o d u c e system [4] a did, No systems to our k n o w l e d g e a r e a b l e t o c o n d u c t a c h a n n e l - c h e c k i n g dialogue. 2.3. Explicit Indications of Incomprehension The principle whenever he of fails implicit confirmation to comprehend message incorrectly. requires a message or a listener suspects to give that he an explicit may have indication received a In this section we deal with the most s t r a i g h t f o r w a r d w a y a l i s t e n e r c a n d o t h i s : b y asking the speaker what he said or meant. Such questions v a r y a c c o r d i n g t o h o w m u c h of t h e o r i g i n a l utterance the listener thinks he understood, h o w informative h e t r i e s t o be about the nature of the problem, and how much he wants to influence the speaker's subsequent reply. The speaker least informative way of indicating incomprehension, and indirectly of asking t o r e p e a t his message is through a phrase such as "I b e g y o u r p a r d o n ? " o r the "What w a s t h a t ? " If the lack of understanding was caused b y a transient p r o b l e m s u c h as n o i s e on t h e c h a n n e l o r inattention o n the part of the speaker, this strategy may result in a s e c o n d a n d s u c c e s s f u l attempt at communication. If, on the other hand, the failure w a s c a u s e d b y a 9 n o n - t r a n s i e n t p r o b l e m such as the listener's unfamiliarity with the w o r d s or c o n s t r u c t i o n u s e d (uncommon with human beings, but likely to occur frequently w i t h computer systems), s e c o n d attempt at communication is no more likely to be successful than the first. original speaker is g i v e n no clue to the nature of the difficulty w h a t h e is s a y i n g in a hit and miss fashion. Since the the listener is h a v i n g u n d e r s t a n d i n g w h a t he is saying, he is unable to correct that difficulty e x c e p t b y a in changing Successful communication is more likely t o r e s u l t if t h e l i s t e n e r c a n p r o v i d e some clues to the nature of the problem, and the c o n v e n t i o n s of h u m a n d i a l o g u e make it e a s y to do so. Indeed, we believe that the p r o v i s i o n of i n f o r m a t i o n t o t h e o r i g i n a l s p e a k e r about w h e r e and what he is failing to communicate is fundamental t o t h e e x t r a o r d i n a r y r o b u s t n e s s of human conversation. Without the ability to z e r o in o n p r o b l e m s , h u m a n c o n v e r s a t i o n c o u l d not achieve robustness in the apparently e f f o r t l e s s w a y that it does. T h e s p e c i f i c i t y of the clues the listener can provide to the speaker about the n a t u r e and l o c a t i o n of a communication difficulty depends on the d e g r e e to w h i c h the listener f a i l e d t o u n d e r s t a n d the u t t e r a n c e . If he understood nothing at all, the only clue he c a n p r o v i d e is w h a t h e e x p e c t e d the s p e a k e r to say: S: T h i s is d i r e c t o r y assistance. U: <GARBLE> S: I didn't q u i t e c a t c h that; can I get a number for you? If h e h a s u n d e r s t o o d something, he can show what he has understood while indicating that h e h a s n o t u n d e r s t o o d fully: U: W h a t is the number for <GARBLE>? S: W h o s e number did y o u want? U: W h a t is *he number for Jim <GARBLE>? S: Jim w h o ? T h i s a l l o w s t h e original speaker to concentrate on transmitting only the part of the m e s s a g e that w a s not u n d e r s t o o d , making "Smith", for instance, a suitable r e p l y in the s e c o n d e x a m p l e . If t h e l i s t e n e r has u n d e r s t o o d enough to narrow the possibilities d o w n to a small n u m b e r he c a n list them. U: W h a t is the number for <?>ill Smith? S: Did y o u s a y Bill Smith or Phil Smith? T h i s s t r a t e g y allows the original speaker to solve the communication p r o b l e m w i t h a p h r a s e l i k e " t h e s e c o n d o n e " , w h i c h is presumably much less susceptible to e r r o r than r e p e t i t i o n of t h e p a r t of the communication that originally caused the e r r o r ("Bill" or "Phil"). the information given by these various strategies to the original s p e a k e r In g e n e r a l , allows him c o n c e n t r a t e o n imparting the information that did not get across, and thus makes f o r to much 10 g r e a t e r d i r e c t e d n e s s and robustness in the resolution of communication difficulties. A s a final s t r a t e g y , if the listener has p r o d u c e d an interpretation for the utterance, but is u n c e r t a i n of t h a t i n t e r p r e t a t i o n b e c a u s e of noise or because the interpretation is incongruous, he c a n ask e x p l i c i t l y if his complete interpretation is correct. T h e w a y in w h i c h incomprehension is indicated can influence the form of the clarification. subsequent T h i s fact is of particular importance for a gracefully interacting s y s t e m limited p o w e r of comprehension. It is not sufficient for such a s y s t e m m e r e l y to with indicate i n c o m p r e h e n s i o n , or e v e n to indicate it informatively, so that its user is made c l e a r l y a w a r e of t h e n a t u r e of its difficulty in comprehension; it should also indicate its i n c o m p r e h e n s i o n in a f o r m most l i k e l y to elicit a clarification from the user that the system c a n c o m p r e h e n d . For i n s t a n c e , as C o d d [6] points out, questions of the form "What do y o u mean b y ..." a r e liable t o e l i c i t a d e s c r i p t i o n o n a level much too sophisticated for such a system. T h u s if t h e s y s t e m d o e s not u n d e r s t a n d " e x t e n s i o n " in: W h a t is the e x t e n s i o n for Jim Smith? it is u n w i s e to r e p l y : W h a t is an extension? b u t b e t t e r to ask: Do y o u w a n t Jim Smith's number or his address?* Both replies indicate the problem word exactly, and are thus in some sense equally i n f o r m a t i v e about the nature of the problem in comprehension, but the s e c o n d is p r e f e r a b l e s i n c e it e n c o u r a g e s the user to use either "number" or "address" in his r e p l y , and h e n c e g i v e t h e s y s t e m a g o o d chance of understanding it. W h i l e a g r a c e f u l l y interacting system should try to shape its users r e s p o n s e s , t h e f o r m of t h e r e s p o n s e s cannot b e guaranteed. In particular, the system cannot p o s e multiple c h o i c e q u e s t i o n s to the user and expect the user always to pick one of the choices g i v e n , as many current s y s t e m s including Codd's RENDEZVOUS system [6]. Multiple c h o i c e questions a r e c o n t r a r y to the w h o l e spirit of graceful interaction and sequences of them w o u l d frustrate a alternatives user. A gracefully to the user interacting system can and should be able to soon present as in the last example, but it must be p r e p a r e d f o r t h e u s e r e x p r e s s his c h o i c e in his o w n w a y or to come up with an entirely different o p t i o n . Do y o u w a n t Jim Smith's telephone number or his address? I t is, of course, highly desirable that the system should note the reply to th.s question, and remember it if he user ever says the word, extension, again, since repetition of such a question would surely be very fruttralmf Ua user Single word learning of this sort has been discussed by Carboneil in connexion with his Politics system [3] In general, the problems of learning from mistakes and adapting to the idiosyncr.cie. of individual users ere aspects of graceful interaction we have chosen not to deal with. ! do to 11 E x a m p l e s of r e a s o n a b l e r e s p o n s e s include: T h e number. T h e first one. Both. I want Do y o u I want I want to p h o n e him. k n o w w h e r e he is now? his s o c i a l - s e c u r i t y number. the number, but I meant Joe Smith. T h i s r a i s e s the p o s s i b i l i t y that the user's clarification will also be misunderstood, a n d of a sequence a of misunderstandings and clarifications failing to c o n v e r g e . In s u c h cases, g r a c e f u l l y i n t e r a c t i n g s y s t e m will have to become more and more explicit about the n a t u r e of its p r o b l e m s and its limitations and r e l y on the user to accommodate himself (see S e c t i o n 5.6. In s h o r t , a g r a c e f u l l y interacting system must be able to e x p r e s s its o w n incomprehension of o r u n c e r t a i n t y about what the user said, and be able to deal w i t h the user's r e p o r t s of his own problems in comprehending the system. While most systems can indicate i n c o m p r e h e n s i o n in some w a y , most are uninformative and undirective in the s e n s e s w e h a v e discussed. As complaints of far as w e know, no current system can r e s p o n d incomprehension by its user. appropriately to explicit Clearly, formulating adequate r e p l i e s to such c o m p l a i n t s w o u l d r e q u i r e a system to keep track of the things it said and the r e a s o n s for s a y i n g them. 2.4. E c h o i n g and the Use of Fragmentary Recognition O n e r e s t r i c t i o n a s s o c i a t e d w i t h explicit indications of incomprehension is the e x p e c t a t i o n o f a reply. In this s e c t i o n w e discuss a technique, called echoing, w h i c h the listener c a n u s e t o c o n f i r m his i n t e r p r e t a t i o n of a message, but which does not require a r e p l y from the o r i g i n a l s p e a k e r if the i n t e r p r e t a t i o n is correct; a reply is needed if the i n t e r p r e t a t i o n t u r n s o u t t o b e incorrect. E c h o i n g is thus an efficient way of confirming interpretations of w h i c h the l i s t e n e r is f a i r l y s u r e , but is less useful for cases in which the listener is less c e r t a i n . We also m e n t i o n a n o t h e r w a y of reducing the number of explicit indications of i n c o m p r e h e n s i o n - b y u s i n g f r a g m e n t a r y recognitions of utterances as the basis for complete i n t e r p r e t a t i o n s . interpretations can never, of course, be guaranteed to be c o r r e c t , and t h e y Such should be c o n f i r m e d through echoing. If the original speaker receives a message indicating that his message has not c o m p l e t e l y or c o n f i d e n t l y understood, he will normally try to clarify or c o n f i r m his message. been original In p a r t i c u l a r , an explicit request for confirmation cannot be a n s w e r e d implicitly: U: What is the number <?>ter Smith? S: Did y o u s a y Walter Smith? U: C a n y o u also g i v e me his address? 12 T h i s e x a m p l e is unnatural unless the first speaker says " y e s " b e f o r e asking f o r the a d d r e s s . It would become tedious to answer too many explicit requests for confirmation, s o f o r t u n a t e that human dialogue provides a method of asking for confirmation in a w a y it is that a l l o w s an affirmative r e s p o n s e to be given implicitly. When the listener w a n t s to b e s u r e that h i s i n t e r p r e t a t i o n of the speaker's utterance (or more commonly part of it) is c o r r e c t , he h a s o n l y t o e c h o his i n t e r p r e t a t i o n . h e i m p l i c i t l y c o n f i r m s it. If the original speaker does not comment o n the e c h o , t h e n Thus: U: W h a t is the number for <?>ter Smith? S: W a l t e r Smith ... S: His number is 5 5 9 2 . U: 5 5 9 2 . Thank y o u . S: Y o u ' r e w e l c o m e . Of c o u r s e , t h e o r i g i n a l speaker still has the option of confirming the e c h o explicitly; a n d if t h e e c h o is i n c o r r e c t , he must indicate that explicitly. Direct echoes of this type are useful because they can be confirmed implicitly, thus a l l o w i n g a l i s t e n e r to v e r i f y his understanding without disrupting the flow of the c o n v e r s a t i o n b y a clarifying sub-dialogue. Nevertheless, a direct echo is still somewhat d i s r u p t i v e b y its v e r y p r e s e n c e ; if it w e r e not for its verifying function, it would be completely s u p e r f l u o u s t o the conversation. small amount of T h e r e is, however, another verifying technique w h i c h avoids e v e n disruption. The technique, which we will call indirect echoing, this is to i n c o r p o r a t e t h e assumption to be verified into the listener's next utterance in the normal f l o w of the conversation. For example: U: What is the <GARBLE> for Jim Smith? S: T h e number for Jim Smith is 2597. In this e x a m p l e , S is able to determine from* the linguistic and non-linguistic < G A R B L E > is almost certainly "number". an indirect echo. context that S then incorporates this assumption into his r e p l y as T h e net effect is much the same as for a direct echo ("the number"); if U, t h e o r i g i n a l s p e a k e r , d o e s not comment on this indirect echo it is implicitly c o n f i r m e d , a n d if h e w i s h e s to c o r r e c t it explicitly, he may do so. is the absence of any utterance by S The essential d i f f e r e n c e from a d i r e c t outside the natural flow of the echo conversation. I n t e r e s t i n g l y e n o u g h , this seems to deny the original speaker the o p p o r t u n i t y to c o n f i r m t h e assumption explicitly. T h e f o r m of an e c h o need not always correspond to the form of the u t t e r a n c e w h i c h e c h o e d , b u t c a n b e a p a r a p h r a s e of it. is The following examples s h o w p a r a p h r a s e d e c h o e s of t h e d i r e c t a n d indirect t y p e s respectively. 13 U: I w a n t to contact Roger Smith. S: T h e n u m b e r for Roger Smith ... S: It's 2 5 9 7 . U: I w a n t to <GARBLE> Jim Smith. S: T h e n u m b e r for Jim Smith is 2957. A n u m b e r of q u e s t i o n a n s w e r i n g systems, including RENDEZVOUS [6] and C O O P [ 2 3 , 2 8 ] , h a v e adopted the policy of presenting the user with a paraphrased g e n e r a t e d f r o m the system's internal representation of that input. chance to find and correct i n t e r p r e t a t i o n of his input. considered graceful any misinterpretations made by version of each input, This g i v e s the u s e r the system during However, the unconditional generation of p a r a p h r a s e s c a n n o t interaction, and quickly become tedious for the user. Direct its be echoes, w h e t h e r p a r a p h r a s e s or not, should only be generated w h e n there is some significant o f u n c e r t a i n t y in the interpretation. a degree Selective paraphrase generation along t h e s e lines, w h i c h o f c o u r s e i n v o l v e s deciding w h e n uncertainty exists, has b e e n attempted b y C a r b o n e l l in his M I C S s y s t e m [3]. In a d d i t i o n to its use for checking the correctness of interpretations, e c h o i n g , e s p e c i a l l y direct echoing, can also serve as a time filler. In an analysis of p r o t o c o l s of directory a s s i s t a n c e c o n v e r s a t i o n s , w e found that it was common to echo the name b e i n g a s k e d while t h e number w a s b e i n g found (which typically took several seconds). T h i s is for perhaps e x p l a i n a b l e b y a listener's need (discussed in Section 2.2) to r e p l y in o r d e r to c o n v i n c e speaker served that just his u t t e r a n c e has b e e n received at all. the same p u r p o s e It appears that e c h o e s in this as a phrase such as: Just H a second!", indicating the case that the c o m m u n i c a t i o n had b e e n r e c e i v e d , but that there would be a delay b e f o r e the real r e p l y w a s given. It is i n t e r e s t i n g to note that a full echo of the input is rarely given in s u c h t i m e - f i l l i n g e c h o e s ; In t h e p r o t o c o l s , p e o p l e e c h o e d only the name as o p p o s e d to any o t h e r p o r t i o n s of t h e r e q u e s t f o r a number. In doing this they are perhaps conforming to the c o n v e n t i o n s t h e c l a r i f y i n g t y p e of echo, b y echoing the part of the input least p r e d i c t a b l e in the of given c o n t e x t , a n d t h e r e f o r e most liable to misinterpretation. In o r d e r to conduct natural dialogues, and to avoid frustrating its user w i t h a s t r e a m of t r i v i a l c o m p l a i n t s of incomprehension, a gracefully interacting system should c o n f o r m to the h u m a n c o n v e n t i o n s of echoing; i.e. it should be able to issue echoes (direct or i n d i r e c t ) w h e n it is u n c e r t a i n in its comprehension, or w h e n appropriate as time-fillers, and s h o u l d b e a b l e t o monitor and make use of any corrections its user offers in r e s p o n s e . It must also be c o n s t a n t l y o n the w a t c h for e c h o e s by the user of what it says, and should b e r e a d y t o i s s u e c o r r e c t i o n s as n e c e s s a r y . A s far as w e know, none of these components of e c h o i n g has e v e r b e e n i m p l e m e n t e d in a dialogue system. 14 A w a y more radical than echoing of avoiding explicit requests for clarification is to i g n o r e u n c o m p r e h e n d e d but inessential elements of what the user said. to tell whether uncomprehended. try. a uncomprehended input is essential or not, simply because it is H o w e v e r , there are a number of heuristic strategies that the s y s t e m c a n F i r s t , if the incomprehensible element is small enough and is contextually d e t e r m i n e d as f u n c t i o n w o r d or "noise phrase", it can be ignored. It is o f t e n unimportant d e t e r m i n e r is d e f i n i t e or indefinite or which preposition is used. be In g e n e r a l , it is i m p o s s i b l e i g n o r e d t h r o u g h a k e y - w o r d approach. users are likely to offer explanations whether a C e r t a i n larger s e g m e n t s c a n For instance, as was found in w o r k o n G U S [ 2 ] , for their desires, so that if any unrecognizable s e q u e n c e c a n b e f o u n d w h i c h begins with "because", "since", etc., it is not u n r e a s o n a b l e to i g n o r e that s e q u e n c e as an irrelevant explanation. A m o r e g e n e r a l w a y of deciding whether or not a partial comprehension is s u f f i c i e n t is t o d e c i d e w h e t h e r the r e c o g n i z e d components can be combined into something that t h e w o u l d f i n d it r e a s o n a b l e for the user to say. system For example, if a travel agent s y s t e m c o u l d e x t r a c t a c i t y and a date from what a user said to it, it could build a request for a r e s e r v a t i o n t o that c i t y o n that date. T h e combination of recognized fragments is, in f3ct, the b a s i s of t h e q u e s t i o n a n s w e r i n g ability of Waltz's PLANES system [40], and other w o r k o n the c o m b i n a t i o n o f s u c h f r a g m e n t s has b e e n done b y Fox and Mostow [11]. N o n e o f the a b o v e methods of ignoring certain unrecognized elements of an input a r e c l o s e to being foolproof; comprehension. they can, and sometimes will, produce fundamental errors F o r this reason it is important that a gracefully interacting s y s t e m in make c l e a r w h a t assumptions it is making and what the results of its c o m p r e h e n s i o n r e a l l y a r e . would be extremely tedious for the user if the system always made explicit It requests for c o n f i r m a t i o n of e v e r y t h i n g it was unsure of, particularly if the language c a p a b i l i t i e s of the s y s t e m w e r e less than complete as w e might expect for systems in the f o r s e e a b l e f u t u r e . A h e a v y u s e of d i r e c t e c h o e s is little better, since the user would scarcely feel that e x c e s s i v e l y p a r r o t - l i k e b e h a v i o u r w a s v e r y graceful. A more palatable alternative is f o r the s y s t e m t o i n c o r p o r a t e t h e assumptions it makes into its reply through an indirect e c h o . F o r e x a m p l e , if ail a t r a v e l agent s y s t e m c o u l d extract from: I am i n t e r e s t e d in p a y i n g a visit to Pittsburgh on or around M a y 17 w a s " P i t t s b u r g h " and "May 17", then its r e p l y could be: A b o u t what time o n M a y 17 would y o u like to go to Pittsburgh? T h i s i n d i r e c t e c h o , if u n c o r r e c t e d by the user, will confirm both the system's i n t e r p r e t a t i o n o f t h e f r a g m e n t s , as w e l l as its assumptions about their relationship (it c o u l d h a v e b e e n P i t t s b u r g h i n s t e a d of to Pittsburgh). The notion of always making the system's from assumptions 15 e x p l i c i t is p r e s e n t in the w o r k of Codd [6]; however, Codd suggested that his s y s t e m s h o u l d present its u n d e r s t a n d i n g of each of the user's requests for explicit a p p r o v a l b y the user b e f o r e p r o c e e d i n g to satisfy the request. 2.5. Summary of Robust Communication We can summarize the techniques of robust communication, and their implications g r a c e f u l l y i n t e r a c t i n g systems as follows: - Human d i a l o g u e e n s u r e s accurate communication without the o v e r h e a d of explicit a c k n o w l e d g e m e n t b y using the principle of implicit confirmation, w h i c h allows t h e s p e a k e r to assume his message has been received c o r r e c t l y b y the listener, u n l e s s the listener indicates otherwise. - A s p e a k e r e x p e c t s his listener to reply. If the listener does not r e p l y w i t h i n a v e r y s h o r t time, the original speaker will assume his message has not got a c r o s s , a n d initiate a dialogue to check the channel of communication. From the point of v i e w of the p r i n c i p l e of implicit confirmation, the absence of a r e p l y s e r v e s as a tacit indication b y the listener that he has not r e c e i v e d the speaker's communication. A gracefully interacting system must be able to: c o n f o r m to the c o n v e n t i o n of immediate reply, if necessary with a time-filling r e p l y ; initiate a c h a n n e l - c h e c k i n g dialogue if its user does not reply; and participate in a c h a n n e l - c h e c k i n g dialogue started by the user. - T h e simplest w a y for the listener to indicate that he has not r e c e i v e d the s p e a k e r ' s message c o r r e c t l y is to ask him what he said or meant, while p o s s i b l y at the same time indicating the nature and source of the p r o b l e m in communication. Such an indication requires a reply by the original s p e a k e r . D i f f e r e n t w a y s of asking the question tend to constrain the r e p l y to a g r e a t e r o r lesser extent. T h e necessarily limited linguistic abilities of a dialogue s y s t e m s u g g e s t that it s h o u l d ask the most constraining questions it can in s u c h situations. W h e n a listener thinks he has understood what the speaker said, but is not q u i t e s u r e , he c a n e c h o (the tentative part) of what he thought he h e a r d , e i t h e r d i r e c t l y in a s e p a r a t e utterance, or indirectly by incorporating the e c h o into his n e x t u t t e r a n c e in the normal flow of the conversation. If the original s p e a k e r d o e s not c o r r e c t the echo, it is implicitly confirmed. Echoing thus allows the l i s t e n e r to c o n f i r m his interpretation, without requiring a r e p l y from the s p e a k e r . A d i a l o g u e s y s t e m can avoid many trivial questions by using this technique, a n d b e i n g r e a d y to accept any corrections that are given. In any case, a s y s t e m must monitor its user's echoes of what it says, and be p r e p a r e d to c o r r e c t a n y mistakes. - O t h e r explicit indications of incomprehension, with their requirements for r e p l i e s c a n b e a v o i d e d if the system guesses what the user said on the basis of p a r t i a l o r f r a g m e n t a r y recognition of the input. Assumptions made in this w a y must b e e c h o e d to a v o i d c o n f u s i o n . for 16 3. Flexible Parsing T h e l a n g u a g e u s e d in naturally occurring dialogues b e t w e e n humans d e v i a t e s in many w a y s f r o m c o m m o n l y a c c e p t e d standards of grammatically. There may be i n c o r r e c t prepositions, i n f l e x i o n s , o r lack of agreement; utterances may be elliptical, fragmentary, o r idiomatic; t h e r e m a y b e o m i t t e d or r e p e a t e d w o r d s or phrases; utterances may be b r o k e n off and r e s t a r t e d at any point. sufficiently These deviations occur most frequently common in t y p e d conversations in spoken that any computer dialogues, but system w h i c h are also attempts to e n g a g e humans in g r a c e f u l natural language dialogue of the t y p e d or s p o k e n v a r i e t y must b e a b l e t o d e a l w i t h them. U n f o r t u n a t e l y , most c u r r e n t parsing methodologies are not well suited to the a n a l y s i s o f such deviant input. Most systems analyse linguistic input in accordance with their e x p e c t a t i o n s , grammatical and otherwise, which cannot practically be e x t e n d e d to c o v e r all, o r e v e n most, of the p o s s i b l e deviations; input failing to conform to these e x p e c t a t i o n s , b y s o m u c h as a single w o r d is typically totally incomprehensible to such systems. even In s h o r t , most c u r r e n t p a r s i n g s y s t e m s are quite inflexible in the face of deviant input. B y f l e x i b l e parsing w e mean parsing which can deal with input deviating f r o m grammatical norms and/or the grammatical expectations of the system doing the parsing. Given the c o m m o n o c c u r r e n c e of grammatical deviations in naturally occuring dialogues, f l e x i b l e p a r s i n g is clearly of prime importance for graceful interaction. In the following subsection we d e s c r i b e v a r i o u s w a y s in w h i c h language in natural dialogues can and d o e s d e v i a t e f r o m t h e g r a m m a t i c a l norm; in a s e c o n d subsection, we consider current parsing techniques, t h e w a y in w h i c h t h e y h a v e b e e n u s e d to deal (or fail to deal) with these deviations, and their p o t e n t i a l in t h i s r e g a r d . 3.1. Grammatical Deviations In this s e c t i o n w e d e s c r i b e some of the most common t y p e s of deviations f r o m grammar that arise in natural human conversations: idioms, fragmentary standard utterances, o m i s s i o n s , r e p e t i t i o n s , insertion of noise phrases, small grammatical e r r o r s , and e l l i p s i s . Idioms: E v e r y d a y language contains many idioms, i.e. phrases w h o s e i n t e r p r e t a t i o n c a n n o t b e o b t a i n e d b y using the components of the phrase in the usual w a y ("a w i l d g o o s e c h a s e " h a s v e r y little to d o w i t h geese). Even though the structure of idioms is o f t e n grammatical, it is u n h e l p f u l to p a r s e them, since b y definition their meaning cannot b e o b t a i n e d f r o m t h e i r c o m p o n e n t s ; it is n e c e s s a r y to interpret them as a whole. 17 M a n y o t h e r p h r a s e s while not strictly speaking idiomatic can o f t e n c o n v e n i e n t l y be d e a l t w i t h as a w h o l e . It is possible, for instance, to analyse: C o u l d y o u g i v e me the number for Joe Smith as a c o n d i t i o n a l q u e s t i o n about ability, and from that to recognize it as an indirect s p e e c h act m a k i n g a r e q u e s t f o r information. probably On the other hand, a d i r e c t o r y assistance s y s t e m would f i n d it more convenient to interpret "could y o u give me" d i r e c t l y as a r e q u e s t for information. Fragmentary utterances: Humans are adept at piecing together fragments of an u t t e r a n c e t o c o m e u p w i t h an i n t e r p r e t a t i o n of the utterance as a whole. fragmentary recognition occurs mainly when noise of For p e o p l e , the n e e d f o r some sort or lack of such clarity in p r o n u n c i a t i o n has made it impossible for them to hear all of an utterance, or w h e n a s p e a k e r has actually frequent spoken in disjointed fragments. In the case of computer systems, a more n e e d f o r fragmentary recognition might arise through ignorance of v o c a b u l a r y constructions. or In e i t h e r case, dealing with such fragmentary input, as humans s e e m able to do, r e q u i r e s an ability to extract the largest recognizable fragments from the input, and to use t h o s e f r a g m e n t s to construct a reasonable interpretation. In many fragments c a s e s , if it is possible at all, t h e n that to construct interpretation will a reasonable be the correct i n t e r p r e t a t i o n of one. For recognized example, if the f r a g m e n t s " g i v e m e " and "the number for Joe Smith" w e r e recognized out of: W o u l d y o u b e so kind as to give me your listing of the number* for J o e Smith t h e n t h e o b v i o u s i n t e r p r e t a t i o n would be the right one. On the other hand, fragments can be p i e c e d t o g e t h e r i n c o r r e c t l y , as in the same limited understanding of: I a s k e d y o u to g i v e me the number for Joe Smith, but I meant F r e d . Thus, allowing portion of fragmentary an u t t e r a n c e recognition raises the question of w h e n the can be safely ignored, and w h e n it cannot. uncomprehended There is s u r e l y no s o l u t i o n t o this p r o b l e m that is guaranteed always to be correct, but as d i s c u s s e d in S e c t i o n 2.4, t h e t e c h n i q u e s of robust communication are helpful w h e n an i n t e r p r e t a t i o n built from f r a g m e n t s t u r n s out to b e w r o n g . Omissions, repetitions, and noise phrases: It is possible to omit and r e p e a t w o r d s o r phrases o r i n s e r t n o i s e p h r a s e s into an utterance without significantly decreasing its intelligibility as in: W h a t the the number for lemme see Joe Smith? Of c o u r s e , f u n c t i o n w o r d s have the least effect, but in the proper context content w o r d s or IS p h r a s e s may b e omitted or r e p e a t e d without loss of comprehension. Omission and r e p e t i t i o n is m u c h m o r e of a p r o b l e m in s p e e c h input than in typed input (though it d o e s o c c u r in t h e latter). parts T h e n o i s i n e s s of a s p e e c h signal can often effectively cause omission of w o r d s of w o r d s , (particularly small function words); while repetition commonly o c c u r s or as p e r s o n r e p e a t s (or w o r s e , replaces) words, or sequences of w o r d s as he is p r o d u c i n g a an u t t e r a n c e as in: W h a t is e r c o u l d er y o u g...give me the number er the extension for J o e Smith? This written representation does not indicate the prosodic features - rising i n t o n a t i o n s , p a u s e s , and stress - present in such utterances; these f e a t u r e s and falling may b e very i m p o r t a n t in human comprehension of utterances containing such repetition, b r e a k i n g - o f f , a n d r e s t a r t i n g , as w e l l as in comprehension of more fluent utterances. Grammatical Errors: It is not uncommon through carelessness or ignorance, m i s t a k e s in t h e use of language without materially affecting comprehensibility. to make Examples are i n c o r r e c t t e n s e s , lack of agreement b e t w e e n subject and v e r b , using the w r o n g p r e p o s i t i o n , and, for typed input, misspelling. Humans appear to be remarkably unaffected by these e r r o r s in t h e i r language understanding, often not e v e n noticing that e r r o r s have b e e n made. Ellipsis: It is not uncommon in dialogue to say things that w o u l d be meaningless o u t s i d e t h e c o n t e x t o f the d i a l o g u e : U: What is the number for Mr. Smith? S: D o y o u mean J o e Smith or F r e d Smith? U: Joe In t h i s i n t e r c h a n g e , " J o e " is said to be an ellipsis of the more complete "I mean J o e S m i t h " . N e e d l e s s to s a y , humans are seldom confused by this such omissions, and are able t o u s e t h e linguistic a n d non-linguistic context to understand the elliptical utterance w i t h o u t difficulty. T h e r e is no c l e a r indication whether human resolution of ellipsis involves the c o n s t r u c t i o n o f a m o r e c o m p l e t e linguistic form. 3.2. Implications for Language Analysers H a v i n g p r e s e n t e d some of the w a y s in which naturally used language c a n d e v i a t e g r a m m a t i c a l norms, w e t u r n to the question of how current parsing techniques c a n c o p e s u c h d e v i a t i o n s , b o t h in practice and in theory. including traditional syntactic and degrees There of top-down semantic left-to-right v a r i e t y , conceptual We will consider a w h o l e range of parsers using fixed grammars parsers, pattern-matching sophistication, and parsers developed for use specifically of parsers with with parsers, both of to the various spoken is no intention, h o w e v e r , to s u r v e y the field of parsing, so r e f e r e n c e s from input. existing 19 s y s t e m s w i l l b e c u r s o r y , and will assume prior knowledge on the part of the r e a d e r . The on most common p a r s i n g technique in use today is top-down left-to-right a fixed grammar. This technique is usually implemented by parsing an augmented based transition n e t w o r k ( A T N ) of the t y p e d e v e l o p e d by Woods [45], or some close variation; s u c h p a r s e r s c a n p a r s e in t e r m s of either general grammatical categories as in the LUNAR s y s t e m [ 4 6 ] , o r i n t e r m s of c a t e g o r i e s having some significance within a restricted domain of d i s c o u r s e as in t h e L A D D E R s y s t e m [33]. P a r s e r s of this t y p e are extremely fragile; they typically fail to p r o d u c e any s o r t of if their input obviously deviates from their grammar by so much as a single w o r d . parse This property makes them extremely unsuitable for dealing with input w i t h missing o r repeated w o r d s , grammatical e r r o r s , or w i t h fragmentary input. These parsers left-to-right because are so algorithms fragile explore because of the set of the depth-first potential w a y in w h i c h their parses. When a partial top-down parse fails t h e next w o r d d o e s not fall into one of the categories allowed b y the grammar at t h a t p o i n t , t h e f a i l u r e is taken to mean that the parser made an incorrect c h o i c e e a r l i e r ; that p a r s e p a t h is a b a n d o n e d ; and a different choice is made at an earlier choice point. unless Clearly, t h e input c o n f o r m s exactly to one of the possibilities allowed b y the grammar, parse at all w i l l result. no This fail and back-up procedure is so fundamental to t h e w a y in w h i c h s u c h p a r s e r s s e a r c h their o f t e n v e r y large space of possibilities that it is v e r y d i f f i c u l t t o m o d i f y t h e i r algorithms to deal with repeated words, incorrect endings, etc.. Weischedel a n d B l a c k [ 4 1 ] have made some p r o g r e s s in this direction, by arranging to relax predicates e n f o r c i n g s u c h grammatical constraints as noun-verb agreement w h e n e v e r a s t r a i g h t f o r w a r d parse fails, but these tactics can deal with only a relatively limited class of grammatical deviations. An approach fragmentary capable of particular which avoids recognition recognizing much involves the of use a description of t y p e of c o n s t r u c t i o n . this fragility of one and a number particular opens of up specialist type of entity the possibility subgrammars, or analysing of each one While each subgrammar is applied in the same t o p - d o w n , l e f t - t o - r i g h t f a s h i o n as b e f o r e , they are applied independently, so that failure of o n e t o f i n d a p a r s e d o e s not affect the performance of the others. his PLANES [ 4 0 ] system, w h i c h has a subgrammar This is the a p p r o a c h taken b y W a l t z in for each t y p e of entity known to the s y s t e m ; if an input contains r e f e r e n c e s to several different entities k n o w n to the s y s t e m , t h e n e a c h of t h e m is a n a l y s e d quite independently of the others. W h i l e s p l i t t i n g u p the recognition in this w a y ensures that a single r e p e a t e d o r word or grammatical mistake will not wreck the entire parse, the parsing omitted of each 20 subcomponent problems of still suffers deciding from the same fragility as b e f o r e , and there are the extra at w h i c h point in the input to apply each subgrammar, and of s u b g r a m m a r a n a l y s i n g input w h i c h should have been analysed by a different o n e . one In a d d i t i o n , w h i l e it is d e s i r a b l e to be able to parse fragments if the need arises, it is also d e s i r a b l e t o p a r s e u t t e r a n c e s as a w h o l e w h e n e v e r possible; if an utterance is always p a r s e d as a s e t of fragments, it is always necessary i n t e r p r e t a t i o n s of the fragments. to construct a complete interpretation In the case of Waltz's system, the domain w a s out of the sufficiently c o n s t r a i n e d that the interpretation of a set of fragments w a s always unique, but in g e n e r a l t h i s w i l l , of c o u r s e , not b e true. More possibility for flexibility in this area of missing and extra w o r d s and grammatical e r r o r s s e e m s to be o f f e r e d b y the conceptual parser of Riesbeck [32]. Since a p a r s e b y that s y s t e m is o r g a n i z e d a r o u n d and directed b y the meaning of the key action or s t a t e in t h e s e n t e n c e b e i n g p a r s e d , the parse could presumably be made less sensitive to the p r e s e n c e of the correct function words. greatly In practice, this potential flexibility does not seem t o h a v e b e e n e x p l o i t e d , and the system still relies heavily o n finding, for example, the correct p r e p o s i t i o n s in p r e p o s i t i o n a l phrases. A q u i t e d i f f e r e n t s t y l e of analysis is based on pattern-matching or r e w r i t e r u l e s . In t h e s i m p l e s t t y p e of pattern-matching approach, interpretation of input is o b t a i n e d b y m a t c h i n g it as a w h o l e against a set of patterns of words. was [31]. This was essentially the w a y in w h i c h i n p u t a n a l y s e d b y many e a r l y AI language processing programs, such as ELIZA [ 4 2 ] a n d SIR E a r l y v e r s i o n s of P A R R Y [8] used an approach barely more sophisticated t h a n t h i s , a n d p a t t e r n matching (of s t r u c t u r e s more complex than words) was the basis of Wilks machine t r a n s l a t i o n s y s t e m [43]. Flexibility process. in Many pattern-matching of the simpler parsing can be obtained by flexibility pattern matching systems used variables in the in their matching patterns w h i c h c o u l d match a r b i t r a r y strings, but while this resulted in systems w h i c h c o u l d a n a l y s e a very wide range of input, the level of analysis was correspondingly shallow. More s o p h i s t i c a t e d forms of flexibility could presumably be obtained b y allowing partial matches of p a t t e r n s , w i t h p e r h a p s restrictions to ensure that some content w o r d s w e r e i n c l u d e d in t h e m a t c h ; it is not h a r d to s e e how this could deal with repeated and omitted w o r d s , a n d in s o m e c a s e s grammatical e r r o r s . Surprisingly enough, this potential has not b e e n w i d e l y e x p l o i t e d ; T h e m o r e a d v a n c e d v e r s i o n of PARRY [30], for instance, seems to r e l y o n exact matching of v e r y g e n e r a l p a t t e r n s , rather than inexact matching of specific patterns to deal w i t h d e v i a n t input. Nevertheless, pattern-matching parsing seems to o f f e r the greatest d e a l i n g w i t h grammatical mistakes, and missing and repeated w o r d s . potential for 21 A f u r t h e r a d v a n t a g e of p a t t e r n matching for the purposes of flexible p a r s i n g is its a b i l i t y t o h a n d l e idioms definition and f i x e d phrases. unhelpful in their In fact, since the structure of idiomatic p h r a s e s is interpretation, some sort of wholistic p a t t e r n - m a t c h i n g seems almost obligatory for their interpretation. traditional parsers including the one for the LUNAR approach by such as Indeed, a n u m b e r of m o r e system [46] have a distinct p r e - p r o c e s s i n g s t a g e in w h i c h idioms are recognized by a pattern-matching p r o c e s s . Pattern matching d o e s , h o w e v e r , have its limitations. utterances by corresponding regularities single t o the patterns, there will be If one tries to a n a l y s e commonalities between the regularities of expression in the domain of d i s c o u r s e . that the more usual t y p e of grammar is designed to account for. It complete patterns is The these solution w i t h i n a p a t t e r n - m a t c h i n g system is to introduce r e w r i t e rules, substituting t h e r e s u l t s p l a c e o f w h a t is matched. in In this way, patterns which account for regularities in t h e u s e o f a u x i l i a r y v e r b s c a n b e combined w i t h patterns which recognize specific idioms t o p r o d u c e language a a n a l y s e r , b a s e d o n pattern-matching, but capable of c o p i n g w i t h r e g u l a r i t i e s i n a non-redundant way. T h e p o w e r of this approach has b e e n s h o w n in more r e c e n t w o r k on P A R R Y [30]. T h i s u n c e r t a i n t y about w h e r e fragments begin and end makes any t o p - d o w n a p p r o a c h w i t h e i t h e r a l e f t - t o - r i g h t o r r i g h t - t o - l e f t directionality inappropriate for f r a g m e n t a r y r e c o g n i t i o n . A b o t t o m - u p a p p r o a c h is indicated. Again, a hierarchical pattern-matching a p p r o a c h s e e m w e l l s u i t e d o n t h e s e grounds, but it has not b e e n used in practice this w a y . the special uncertainties HEARSAY-II [18] problem a p p l y i n g strict by robustness and of achieved HWIM by i n c r e a s e d search effort. role to play since their medium, several speech parsers, including [47] systems, have dealt w i t h the grammars in a bottom-up non-directional this technique is, however, bought of the recognition fashion. at the p r i c e of Because of those fragmentary would The extra considerably T h e s e systems also suggest that t o p - d o w n r e c o g n i t i o n still h a s the e x p a n s i o n of already recognized fragments in a c c o r d a n c e w i t h a the g r a m m a r c r e a t e s h y p o t h e s e s about what exists o n either side of the fragments, a n d if t h e s e h y p o t h e s e s c a n b e v a l i d a t e d in a t o p - d o w n manner, efficiency is g r e a t l y e n h a n c e d . O n c e f r a g m e n t s have b e e n recognized, it is necessary to construct an interpretation the fragments. from In the P L A N E S system, it is assumed that there is a l w a y s e x a c t l y o n e w a y o f c o m b i n i n g t h e i n t e r p r e t a t i o n of the fragments to produce a meaningful i n t e r p r e t a t i o n ; t h i s s t r a t e g y w o r k s b e c a u s e of a highly limited domain of discourse. Other, more g e n e r a l , w o r k o n f r a g m e n t c o m b i n a t i o n b y Fox and M o s t o w [11] deals also w i t h situations in w h i c h c o n f l i c t i n g f r a g m e n t s h a v e b e e n found. Most parsers d o not d e a l w i t h ellipsis, the omission from an utterance of details that 22 context c a n p r o v i d e (see also Section 6.3). The problem is tackled in a limited w a y for p a r s e r s built b y LIFER [20], A n y input not recognized in the normal w a y is assumed t o b e a n e l l i p s i s of a s e n t e n c e structurally analogous to the last previously r e c o g n i z e d \nput and an f attempt is made t o p a r s e the input according to each of the possible s t r u c t u r a l analogies. T h e p a r s e r of the GUS s y s t e m [2] deals with ellipsis in r e p l y t o a question (e.g. Q: " W h e n d o you want t o g o ? " A: " t e n " , instead of A: "I want to go at ten"), b y i n s e r t i n g an o t h e r w i s e u n r e c o g n i z a b l e input into a template generated from the question asked, ("I want t o g o at ..." f o r " W h e n d o y o u w a n t to go?"), and then parsing the filled-out template in t h e normal w a y . N e i t h e r mechanism c a n c o p e w i t h more general forms of ellipsis. T h i s g e n e r a l a p p r o a c h to ellipsis is based o n the assumption that the " c o m p l e t e " f o r m o f t h e e l l i p t i c a l input must b e d i s c o v e r e d b e f o r e the input can b e analysed. a p p r o a c h is t o t r e a t elliptical inputs as complete within their context. An alternative In p r a c t i c a l t e r m s t h i s m e a n s p r e d i c t i n g p o s s i b l e ellipses according to the context, and attempting t o p a r s e d i r e c t l y f r o m t h e input. Such a direct approach avoids the possibly u n n e c e s s a r y c o m p l i c a t i o n o f e n c l o s i n g t h e (presumably) important part h a v e t o u n w r a p it again in the analysis. of an input in a "completing" c o n t e x t o n l y be predicted easily a l t e r n a t i v e approach^ enough for to Carbonell [3] has attempted to deal w i t h e l l i p s i s i n e s s e n t i a l l y this w a y in the context of question answering systems. can them it to be efficient to Of c o u r s e , not all e l l i p s e s recognize them this way; an u n t r i e d as far as w e know, might b e to deal w i t h s u c h e l l i p s e s i n t h e s a m e w a y as fragmentary input. 3.3. Summary of Flexible Parsing T h e l a n g u a g e u s e d in naturally occuring dialogues o f t e n deviates from strict n o r m s ; c o m m o n t y p e s of deviation include: omitted and repeated words, incorrect i n f l e x i o n s and p r e p o s i t i o n s , idiomatic, elliptical, and fragmentary utterances. s y s t e m s t y p i c a l l y d o not deal w i t h these t y p e s of deviant input. apply a strict grammar grammar to their input grammatical Current tenses, parsing In the case of p a r s e r s w h i c h in a t o p - d o w n left-to-right fashion, w h e t h e r that is semantically b a s e d or purely syntactic, there seem to b e fundamental d i f f i c u l t i e s i n v o l v e d i n d e a l i n g w i t h Input which does not conform to the grammar. The majority of a t t a c k s o n d e v i a n t input seem to have taken place in the context of p a t t e r n - m a t c h i n g p a r s e r s , a n d a l t h o u g h this t y p e of parser is less suited to the regularities of language t h a n a m o r e t r a d i t i o n a l t o p - d o w n parser^ there is reason to hope that these difficulties c a n b e o v e r c o m e . Pattern-matching themselves to parsers the kind are v e r y well suited to deal w i t h idiomatic input, a n d also lend of with bottom-up analysis that appears necessary to deal 23 l a n g u a g e c a n b e itemized as follows: - S t r i c t t o p - d o w n l e f t - t o - r i g h t parsing schemes, such as those b a s e d o n A T N s , a p p e a r i r r e d e e m a b l y unsuited to deal with small grammatical deviations s u c h as r e p e a t e d o r omitted w o r d s , incorrect inflexions, etc.. - T h e s e p r o b l e m s can b e localized, but not overcome t h r o u g h the use of s p e c i a l i s t subgrammars. - T h e basic a p p r o a c h of conceptual parsers - fitting an input into a p r e d e f i n e d c o n c e p t u a l f r a m e w o r k suggested by the main action or state of an input - s e e m s p o t e n t i a l l y much b e t t e r able to deal with grammatical irregularities, but has not y e t b e e n e x p l o i t e d in this way. - P a t t e r n - m a t c h i n g p a r s e r s are in a similar situation; they have the p o t e n t i a l t o d e a l w i t h grammatical irregularities through flexible partial matching schemes, b u t t h i s p o t e n t i a l has b e e n little exploited. - Idioms and approach. - F r a g m e n t a r y r e c o g n i t i o n requires a basically bottom-up approach. - S p e e c h - p a r s e r s demonstrate the advantages in efficiency a f f o r d e d b y a l l o w i n g t o p - d o w n s t r a t e g i e s to be used after a context has b e e n e s t a b l i s h e d b y b o t t o m - u p methods. - O n l y limited forms of ellipsis are handled by current parsing systems; all c u r r e n t m e t h o d s r e l y o n filling in the ellipsed components, and parsing the c o m p l e t e d i n p u t ; a little e x p l o r e d alternative strategy is to recognize ellipses as c o m p l e t e i n p u t s , e i t h e r b y p r e d i c t i o n or in the same way as other fragments. other fixed phrases are best handled by a small pattern-matching 4. Domain Knowledge In this s e c t i o n w e discuss the importance to a gracefully interacting s y s t e m of about its d o m a i n of discourse; consider knowledge a class of domains and r e l a t e d tasks that appear e s p e c i a l l y a p p r o p r i a t e for gracefully interacting systems, and that will f i g u r e h e a v i l y in o u r discussions m e t h o d of of the remaining components of graceful interaction; and finally, d e s c r i b e a k n o w l e d g e r e p r e s e n t a t i o n appropriate for gracefully interacting s y s t e m s in s u c h domains. 4.1. The Role of Domain Knowledge A computer system cannot interact gracefully with its k n o w l e d g e about the domain that the interaction concerns. user unless it has substantial Such domain k n o w l e d g e is not, h o w e v e r , a c l e a r l y s e p a r a b l e component of graceful interaction like those w e have d i s c u s s e d ; r a t h e r , it is a p r e r e q u i s i t e or underpinning for the other components. A parser, for instance, 24 requires k n o w l e d g e of h o w w o r d s and the phrasings in which t h e y are u s e d r e l a t e t o u n d e r l y i n g domain; and for the purposes of robust communication, it is important t o the know w h a t a r e t h e important, information-bearing parts of a user's utterance, and w h i c h p a r t s c a n be safely ignored, even if not fully understood; an explanation facility clearly requires k n o w l e d g e of the abilities and mode of operation of the system itself. For t h e t w o c o m p o n e n t s of graceful interaction already discussed: and flexible parsing, we were robust communication able to describe the component in a more or less domain i n d e p e n d e n t w a y . E v e n though examples w e r e necessarily r e s t r i c t e d in domain, it is not h a r d t o s e e h o w the p r i n c i p l e s and requirements w e established apply to v i r t u a l l y a n y d o m a i n . N o r d i d o u r d i s c u s s i o n d e p e n d at all o n the w a y in which the requisite domain k n o w l e d g e w a s r e p r e s e n t e d ; w e w e r e able simply to assume that it was available. N e i t h e r of t h e s e simplifying assumptions holds for the four components d i s c u s s e d in t h e f o l l o w i n g four sections: explanation facility, goals and focus mechanisms, i d e n t i f i c a t i o n d e s c r i p t i o n s , and language generation. from While the components are relevant to and i m p o r t a n t f o r a n e q u a l l y w i d e - r a n g e of domain as the preceding components, t h e y will b e d i f f e r e n t a c c o r d i n g to the particular t y p e of domain involved. significantly In discussing them, w e w i l l t r y t o g i v e a p e r s p e c t i v e o n them for a range of domains; but w e will c o n c e n t r a t e o u r attention o n o n e (quite b r o a d ) class of domains that w e will call simple-service domains, a n d w h i c h i n c l u d e b o t h the d i r e c t o r y assistance and restaurant reservations domains that w e h a v e b e e n following. In t h e following two subsections, w e will describe this class of domains a little more c a r e f u l l y , a n d c o n s i d e r a p p r o p r i a t e representations for knowledge about s u c h domains. 4.2. Simple Services M a n y of t h e simple s e r v i c e s p r o v i d e d in current society, especially those o f f e r e d o v e r t h e t e l e p h o n e , r e q u i r e in e s s e n c e only that the customer or client identify c e r t a i n e n t i t i e s t o t h e p e r s o n p r o v i d i n g the service; these entities are parameters of the s e r v i c e , and o n c e t h e y a r e i d e n t i f i e d t h e s e r v i c e can be p r o v i d e d . We will call tasks which can b e cast in this f r a m e w o r k simple services. Examples of such services are directory assistance (the p e r s o n , b u s i n e s s o r o r g a n i z a t i o n w h o s e number is d e s i r e d is the entity that must be identified) and restaurant r e s e r v a t i o n s ( p a r t y s i z e , time, date must be identified); in fact, airline and any o t h e r t y p e of r e s e r v a t i o n falls into this mould, along with requests for weather information, c u r r e n t m a r k e t p r i c e s , and any other "current value" requests for information. stock A n additional s i m p l e s e r v i c e , w h i c h f o r o b v i o u s reasons is not available from humans, is p r o v i d e d b y an e l e c t r o n i c mail s y s t e m ; the p a r a m e t e r s for sending an item of electronic mail, f o r instance, a r e the 25 s o u r c e a n d d e s t i n a t i o n , plus the (unanalysed) body of the message. Examples of non-simple service domains for gracefully interacting systems include a s s e m b l y tasks as s t u d i e d by a g r o u p at Stanford Research Institute [39] in w h i c h an e x p e r t supervises a novice assembling a mechanical device (the obvious role for a i n t e r a c t i n g s y s t e m h e r e is the expert), or more generally any s u p e r v i s o r y or task. gracefully instructional A c o m p l e t e s e c r e t a r i a l service would also be b e y o n d the realm of simple s e r v i c e s . W e h a v e s i n g l e d out simple service domains for two reasons: tractability. common It in the is clear real from the w o r l d , so above examples their commonness, and t h e i r that simple a solution to the graceful service interaction systems problem are very which was r e s t r i c t e d to simple s e r v i c e domains would still be extremely useful. F u r t h e r m o r e , w e b e l i e v e that the simple graceful service interaction domains. that We interaction problem is tractable in a relatively systems. we still are claim straightforward way In particular, w e believe that the set of components of proposing that our is sufficient for systems p r o p o s e d components operating are n e c e s s a r y in graceful simple for all i n t e r a c t i n g s y s t e m s , w h e t h e r operating in simple service domains or not, but other for service gracefully non-simple s e r v i c e d o m a i n s may r e q u i r e extra skills. 4.3. R e p r e s e n t a t i o n s for Simple Service Domains N o w w e t u r n to the question of how to represent knowledge about simple s e r v i c e d o m a i n s . The use important of conversational examples are the systems GUS in simply system service [2], which domains made is far round-trip from plane novel. Two reservations b e t w e e n p a i r s of cities in California, and the PAL system [37], w h i c h s c h e d u l e d a p p o i n t m e n t s from specifications of participants, meeting place, time, e t c . * Both systems u s e d f r a m e s to o r g a n i z e and r e p r e s e n t their domain knowledge. F r a m e s a r e a method of knowledge representation, named and p o p u l a r i z e d b y M i n s k y [ 2 9 ] , w h i c h s e e m p a r t i c u l a r l y w e l l suited to simple service domains because their s t r u c t u r e t h e n a t u r a l s t r u c t u r e of a simple service: reflects a service specified through the d e s c r i p t i o n of a l i m i t e d s e t of e n t i t i e s w h i c h s e r v e as parameters to the service. A f r a m e is a r e p r e s e n t a t i o n of some entity in terms of the entities w h i c h make it up; a p h y s i c a l o b j e c t c a n b e r e p r e s e n t e d in terms of its parts, an event can be r e p r e s e n t e d terms of its p a r t i c i p a n t s , and a more complex event can be r e p r e s e n t e d in terms of in its Actually, PAL is not an interactive system. It accepts a monologue which specifies all the information necessary to schedule a meeting. Nevertheless, its domain is simple service and many of the same problems, particularly in the area of focus, arise as for a gracefully interacting system. 26 sub-events. service T h e components of a frame are called its slots. systems The use of frames in s i m p l e is thus straightforward; the entities which the user must d e s c r i b e s y s t e m c o r r e s p o n d to slots in a frame which represents the service as a w h o l e . performs to the The system the s e r v i c e b y filling the slots according to the user's descriptions, plus possibly p e r f o r m i n g some manipulations on the completed frame. The u s e of frames p r o v i d e s a number of clear-cut advantages for g r a c e f u l i n t e r a c t i o n in s i m p l e s e r v i c e domains. First, a frame is a declarative data structure, so that is e a s y f o r a s y s t e m t o manipulate its components in whatever order and h o w e v e r o f t e n it d e s i r e s ; t h i s a l l o w s a simple s e r v i c e system to deal with a user's descriptions of slots in w h a t e v e r order t h e y a r e p r e s e n t e d , and to go back and change the entity filling a slot if the user c h a n g e s his mind. F u r t h e r m o r e , the declarative nature of a frame allows the system easily t o k e e p t r a c k o f w h a t has b e e n accomplished so far in a conversation, and what still has to b e d o n e . If t h e u s e r fails to v o l u n t e e r a description for any required slot, the resulting hole in the f r a m e is a signal for the system a p p r o p r i a t e entity. to take the initiative and ask the user for a description of an In the same way, a full complement of filled slots can be the s y s t e m ' s c u e t o p e r f o r m t h e s e r v i c e and attempt to terminate the conversation. A s w e will s e e in S e c t i o n 6.3, a slot (or the goal of filling it) also provides a good representation for what the a t t e n t i o n o f t h e s y s t e m and user is d i r e c t e d to at any given time. service in the form of a frame provides In short, k n o w l e d g e about a s i m p l e a useful way of organizing the acquisition of i n f o r m a t i o n n e c e s s a r y to p e r f o r m the service. The slots in a frame are themselves declarative p o i n t e r s to information other than their actual fillers. other frames which describe the entities that data structures, and so can contain In particular, slots commonly r e f e r may fill them; thus, the person slot to of a d i r e c t o r y a s s i s t a n c e frame might refer to a frame describing a p e r s o n w i t h slots f o r s u r n a m e , first name, title, etc.; in turn individual letters. a surname might be d e s c r i b e d b y a frame with slots for T h e s e r e f e r e n c e s to other frames serve as t y p e information to h e l p s y s t e m d e c i d e in w h i c h slot a given description should be placed. In addition, t h e a potential t h e y p r o v i d e to r e p r e s e n t the grain of descriptions to essentially a r b i t r a r y levels of f i n e n e s s (Jim Smith; Jim; a p e r s o n w h o s e first name begins with \T) is important for the f o c u s and i d e n t i f i c a t i o n f r o m d e s c r i p t i o n components, as w e shall see. A n o t h e r t y p e of r e f e r e n c e found in slots is to procedures to be invoked w h e n a f i l l e r is f o u n d f o r a slot or w h e n an attempt is made to find a filler. This technique, c a l l e d p r o c e d u r a l a t t a c h m e n t , w a s heavily exploited by the GUS system to allow transmission of f i l l e r s f r o m o n e slot to another and to provide special strategies to fill s e l e c t e d slots. As an example, consider an e l e c t r o n i c mail system which, like ARPAnet mail, provides a s e p a r a t e F R O M and SENDER f i e l d ; if the no F R O M is specified, then it is the same as the p e r s o n c o m p o s i n g 27 m e s s a g e ; if a d i f f e r e n t procedure to transmit F R O M is specified, then SENDER is the same as the c o m p o s e r . the default contents of FROM to SENDER if a d i f f e r e n t FROM A were s p e c i f i e d w o u l d c l e a r l y be v e r y useful in this case. A t h i r d t y p e of information that could be attached to slots in t e m p o r a r y information u s e d in e s t a b l i s h i n g the c o n t e n t s of the slot. For instance, if a system had only partially understood a u s e r ' s d e s c r i p t i o n of a slot, it could store in the slot what it had u n d e r s t o o d , plus p e r h a p s s o m e h y p o t h e s e s about what it had not understood, the better to understand the u s e r ' s r e p l y t o its r e q u e s t for clarification. In addition, it is sometimes necessary to maintain t e m p o r a r y i n f o r m a t i o n about alternative slot fillers, for insstance, w h e n the user's d e s c r i p t i o n of a slot is a m b i g u o u s ( s e e S e c t i o n 7.2). In s u m m a r y , frames p r o v i d e a natural representation for information about simple domains: both a priori knowledge about the domain itself service and information important s y s t e m in i n t e r a c t i n g w i t h the user to formulate a request for service. to a More specifically, the a d v a n t a g e s of a f r a m e - b a s e d representation for a simple service domain include: - - an e a s i l y manipulable representation for the overall task of interacting w i t h a u s e r to formulate a request for a simple service; the structuring of a frame into s l o t s makes it e a s y to tell: - w h a t p a r t s of the request have been formulated, - w h a t vital parts are missing, - and to what part the conversation is currently directed. convenient s t o r a g e associated with each frame slot for information r e l e v a n t to that slot including: - i n f o r m a t i o n about the t y p e and structure of the entities s u p p o s e d t o fill the slot, - p r o c e d u r e s to transmit fillers appropriately from one slot to another p r o v i d e special strategies to fill selected slots, - default fillers, - t e m p o r a r y information about partially described and alternative fillers. and In a d d i t i o n , f r a m e s have already b e e n used successfully in systems that p r o v i d e services. simple 28 5. Explanation Facility A g r a c e f u l l y interacting system must be able to deal reasonably with any q u e s t i o n s that its u s e r s e e s fit to p o s e . to us the most In this section, we consider a limited set of question t y p e s that a p p e a r important for gracefully interacting systems, especially those o p e r a t i n g s i m p l e s e r v i c e domains.* The set includes questions about the system's abilities, its and the motives questions for them, other b a s e d o n these t y p e s . events within the system's e x p e r i e n c e , and We call the. ability to answer questions of in actions hypothetical these types a p p r o p r i a t e l y an explanation facility. T h e set of t y p e s does not cover all reasonable questions, e v e n for simple s e r v i c e d o m a i n s , so that a s y s t e m dealing only with these types could welf find itself f a c e d w i t h questions other that it cannot answer, or even comprehend. cases explanation of complete facility, by incomprehension, we which a gracefully T o deal w i t h such c a s e s , and also describe interacting legitimate a technique, r e l a t e d system can extricate to itself s i t u a t i o n s in w h i c h its inability to comprehend has led to a totally c o n f u s e d dialogue. 5 . 1 . Explanation Types T h e e x p l a n a t i o n facility w e p r o p o s e deals with four different question t y p e s : - Q u e s t i o n s about ability: C a n y o u give me a number in Tokyo? C a n I make a reservation? H o w c a n I make a reservation? C a n y o u give me a reservation for next Tuesday? What places can y o u give listings for? What c a n y o u do? D e s p i t e the fact that many questions ostensibly about ability are in r e a l i t y r e q u e s t s to p e r f o r m the embedded actions (with the service p r o v i d e r as the agent i n s t e a d of " T in appropriate cases), some ability questions must b e a n s w e r e d literally. This requires a gracefully interacting system to h a v e an e x p l i c i t model of its o w n abilities and (some of) its inabilities. - Q u e s t i o n s about e v e n t s : What d i d y o u just say? W h y d o y o u need to know my name? Have y o u made a reservation for Mr. Smith? W h y can't y o u make the reservation for eight? Will y o u hold the reservation against late arrival? W h y not? This impression is based on an analysis of some human interactions in such domains. with the from 29 D e a l i n g w i t h such questions clearly requires a memory for what has already o c c u r r e d in a c o n v e r s a t i o n , together with knowledge about the goal s t r u c t u r e of t h e c o n v e r s a t i o n as d e s c r i b e d in Section 6.2. — H y p o t h e t i c a l questions (embedding either of the two preceding t y p e s ) : C a n y o u g i v e me the number if I give y o u the address? C o u l d y o u give me a reservation tonight if the party size w a s three? Will y o u hold the reservation if I put d o w n a deposit? A n s w e r i n g s u c h questions requires an ability to construct r e p r e s e n t a t i o n s of t h e h y p o t h e t i c a l conditions without getting them confused w i t h the c u r r e n t s i t u a t i o n o r t h e u s e r ' s p r e v i o u s l y stated preferences. - Factual questions: W h e r e are y o u located? Is t h e r e a c h a r g e for this service? What are y o u r opening hours? T h e s e q u e s t i o n s typically involve non-systematic domain-dependent information, a n d must b e a n s w e r e d o n an individual basis. T h e r e p e r t o i r e of q u e s t i o n t y p e s that we propose to deal w i t h in our e x p l a n a t i o n f a c i l i t y is v e r y f a r f r o m a full s p e c t r u m of all the types of questions that can occur in o r d i n a r y h u m a n d i a l o g u e o r d i s c o u r s e ; this is clear from the range of question t y p e s c o n s i d e r e d in the of C h a r n i a k [5], L e h n e r t [24], Scragg [35], and others. Nevertheless, w e b e l i e v e that s u c h an e x p l a n a t i o n facility, t o g e t h e r w i t h the method for dealing with c o n f u s e d dialogues below is sufficient for a gracefully interacting work system in a simple service discussed domain. In p a r t i c u l a r , t h e e x p l a n a t i o n facility p r o p o s e d here is of much greater s c o p e than that p r o v i d e d i n m o r e c o n v e n t i o n a l q u e s t i o n - a n s w e r i n g systems such as LADDER [33], P L A N E S [ 4 0 ] , L U N A R [46], etc. w h i c h have t e n d e d to concentrate on answering factual questions; the factual q u e s t i o n s that s u c h systems answer can, however, be much more complicated and a r e d e a l t w i t h in a much more systematic w a y than we are proposing here. In t h e r e m a i n d e r of this section, we will consider the various question t y p e s in more d e t a i l . T h e next t w o s u b s e c t i o n s will deal with ability questions, first in general human c o n v e r s a t i o n , a n d t h e n in r e s t r i c t e d domains. questions respectively. The next two subsections deal with event and h y p o t h e t i c a l Finally, w e discuss a technique based o n the explanation f a c i l i t y by w h i c h a g r a c e f u l l y interacting system can extricate itself from situations in w h i c h its i n a b i l i t y t o c o m p r e h e n d has c r e a t e d confusion. 5.2. Questions About Ability - Indirect Speech Acts In g e n e r a l human dialogue, second person questions about abilities can be i n t e r p r e t e d t w o q u i t e distinct w a y s . in T h e y can either be taken literally ("Can y o u swim?-; " C a n y o u lift 30 this b a r - b e l l ? " ) , or be interpreted as requests for the listener to perform the e m b e d d e d in the q u e s t i o n ( Can y o u open the window?-; "Can y o u tell me y o u r M action address?"). T h e d i s t i n c t i o n b e t w e e n the two modes of interpretation does not d e p e n d o n the question i t s e l f , but r a t h e r o n the context (both linguistic and non-linguistic) in w h i c h it is s p o k e n . e x a m p l e s g i v e n a b o v e to illustrate the two modes of interpretation could all b e The interpreted t h e o t h e r w a y in a suitably c h o s e n context; the intended interpretations are simply t h e m o r e common ones. T h e d i f f e r e n c e b e t w e e n the two modes of interpretation is accounted for b y the linguistic t h e o r y of s p e e c h acts [1], [36]. In brief, the theory says that the listener will i n t e r p r e t an a b i l i t y q u e s t i o n as a request to perform the embedded action if he believes that the s p e a k e r already open k n o w s w h e t h e r or not he is able to perform the embedded action; thus, " C a n the w i n d o w ? " will generally you be interpreted as a request, because, unless t h e s e are u n u s u a l c i r c u m s t a n c e s , the listener will assume that the speaker believes that he is a b l e to open an t h e w i n d o w . Using an ability question to express a request in this w a y is c a l l e d i n d i r e c t s p e e c h act.* The with m o d e of the i n t e r p r e t a t i o n of a second person question about ability is also specificness of the question. A non-specific correlated question such as ("Can you chop w o o d ? " ) is much more likely to be interpreted as a literal question about abilities, t h a n a m o r e s p e c i f i c q u e s t i o n ("Can y o u chop this wood?"). previously This distinction is somewhat r e l a t e d t o the m e n t i o n e d rule for choosing b e t w e e n the two modes of i n t e r p r e t a t i o n , in that if t h e s p e a k e r w e r e u n s u r e about a general ability of the listener, t h e r e w o u l d b e n o p o i n t in a s k i n g a b o u t a more specific ability. One further point to note is that e v e n though the listener i n t e r p r e t s an ability l i t e r a l l y , t h e a l t e r n a t i v e mode of interpretation may also be relevant. question In particular, if t h e r e is a n y p o s s i b i l i t y that the speaker may wish the listener to perform a p o s s i b l y more specific i n s t a n c e of the action embedded in the question, then the listener is likely to s u s p e c t that t h e s p e a k e r actually d o e s w i s h him to perform such an action. A: C a n y o u c h o p wood? B: Y e s , w h a t do y o u want me to chop? In t h i s e x a m p l e , it is a moot point whether B interpreted A's question literally or not. most i m p o r t a n t point is that B realized that A might want B to c h o p some p a r t i c u l a r The wood, e v e n t h o u g h A ' s q u e s t i o n was more general. Of course, this is not the only kind of indirect speech act. Requests for action, for instance, can also be expressed as questions about preferences ("Do you want to open the window?"), statements about preference ("I would like you to open the window."), and even more obscurely ("It's cold in here.") 31 5.3. Questions A b o u t Ability in a Restricted Domain In t h i s s u b s e c t i o n , w e consider what implications the human methods of dealing w i t h a b i l i t y q u e s t i o n s h a v e f o r a g r a c e f u l l y interacting system. It is s a f e to assume that for the forseeable future any gracefully interacting s y s t e m operate in a highly restricted r e s t r i c t e d set of s e r v i c e s . of direct questions domain, and be able to provide its users with a will highly Making this assumption allows us greatly to simplify the t r e a t m e n t b y the user about the systems ability. In brief, if the user asks the s y s t e m a b o u t its ability to do something that it can do, then it is reasonable to assume that t h e u s e r is asking for the service to be performed; on the other hand, if the user asks a b o u t the system's ability to do something that it cannot do, a negative answer is appropriate. T h u s , f o r a d i r e c t o r y assistance service: U: C a n y o u tell me the number for Jim Smith? S: Y e s , it's 5 6 2 9 . U: C a n y o u connect me to Jim Smith? S: No, I'm s o r r y ! I Can't. A n i n t e r e s t i n g c o n s e q u e n c e of this strategy is that a system requires no explicit model o f its a b i l i t i e s to a n s w e r ability questions which have positive answers. Since s u c h q u e s t i o n s a r e t r e a t e d in the same w a y * as requests to perform the embedded action, the s y s t e m n e e d o n l y r e c o g n i z e the e m b e d d e d action and perform it. To do this, the s y s t e m o n l y n e e d s t o b e a b l e t o r e c o g n i z e exactly those actions that it can actually perform,. C l e a r l y , this c o n s t i t u t e s a n a b i l i t y m o d e l , but of an implicit kind. The chief negative with a n s w e r s in the same w a y , and furthermore, relies on n o n - r e c o g n i t i o n of t h e a c t i o n embedded actions d r a w b a c k to such an implicit model is that it treats all ability q u e s t i o n s in the q u e s t i o n . that It is, of course, unrealistic to imagine that all the embedded the s y s t e m cannot do can be recognized, bui recognition of some of the more l i k e l y o n e s w o u l d allow the system to help users with reasonable misconceptions a b o u t the s y s t e m ' s a b i l i t y , r a t h e r than just giving them a flat negative. U: C a n y o u connect me to Jim Smith? S: N o ! Y o u ' l l have to dial yourself, the number is 5 6 2 9 . A n o t h e r s i t u a t i o n in w h i c h an explicit model of ability and/or inability is useful o c c u r s w h e n a service has p a r a m e t e r s , and the system can only perform the service for c e r t a i n v a l u e s of The treatment cannot be completely identical since "Yes, it's 5629" is not an appropriate response to "What is the number for Fred Smith?", but apart from the format of the response there is no apparent need to discriminate. 32 the parameters. U: C a n y o u g i v e n me a number in Rome? S: No, I have only local numbers. U: C a n y o u make me a reservation for next month? S: No, w e o n l y accept reservations a week in advance. T h e s e c o n d e x a m p l e particularly shows the importance of telling the user the limitations the s y s t e m p l a c e s o n the parameter; if it just replies in the negative, the user has n o w a y of k n o w i n g w h a t p r e v e n t s his request from being fulfilled, and consequently no w a y of d e c i d i n g whether he c a n modify his request so that the system can help him. Note also that this s t r a t e g y a p p l i e s not only to requests phrased as ability questions, but also to r e q u e s t s in a n y o t h e r format. U: I w o u l d like a r e s e r v a t i o n for next month. S: I'm s o r r y , w e only accept reservations a week in advance. In d i s c u s s i n g questions about ability in human dialogues, w e o b s e r v e d that the more v a g u e a n a b i l i t y q u e s t i o n , the less likely it was to be interpreted as a request for action. t h a t a g r a c e f u l l y interacting system need not make this distinction. positive answers c a n be treated as requests for It s e e m s All ability q u e s t i o n s action, while all w i t h n e g a t i v e with answers s h o u l d b e a n s w e r e d negatively, giving whatever extra information is a p p r o p r i a t e as d i s c u s s e d above. U: C a n y o u give me a number? S: Y e s , w h o s e number would y o u like? U: C a n y o u make me a reservation? S: Y e s , w h a t time w o u l d y o u like? A s t h e s e e x a m p l e s suggest, the requests implied by vaguer ability questions are t y p i c a l l y t o o v a g u e to b e f u l f i l l e d d i r e c t l y . In other words, the vagueness takes the form of the absence o f (or v a g u e n e s s in) c e r t a i n parameters of the requested task, without w h i c h the task is not well-defined. In the examples above, the user does not specify the name of the p e r s o n w h o m t h e n u m b e r is r e q u i r e d or the time or other details of the r e s e r v a t i o n . for Appropriate a c t i o n f o r the s y s t e m is not to reject the request because it is incompletely s p e c i f i e d , b u t rather to ask for s p e c i f y i n g information, as in the examples above. Of c o u r s e , t h e extra i n f o r m a t i o n the user goes on to supply may specify a request that cannot b e f u l f i l l e d , b u t s u c h p r o b l e m s must be dealt with as they arise. as a question formulated as about ability other question e s s e n t i a l l y t h e same w a y . Again, note that formulation of the r e q u e s t has little to do with the system's types, statements, or imperatives response; vague should be dealt requests with in 33 There action. are e x c e p t i o n s First, generally to our proposal for handling all ability questions "wh"-questions about ability ("Where can y o u give as r e q u e s t s listings for? ) b e a n s w e r e d directly; although if the answer is a w k w a r d b e c a u s e of its s i z e s o m e o t h e r r e a s o n , it may still be appropriate to treat the question as a r e q u e s t f o r ("Where d o y o u want a listing for?"). embedded "What or action Secondly, the extremeness of v a g u e n e s s in w h i c h t h e a c t i o n is " d o " ("Can y o u do something for me?", or more naturally in " w h " - f o r m , c a n y o u do?") should cause ability questions to be treated literally, and make s y s t e m g i v e an accounting of its abilities. have for should M an explicit model of the Clearly, both these exceptions r e q u i r e a s y s t e m t o its o w n abilities. They c o r r e s p o n d closely to calls o n a "help f a c i l i t y " of the t y p e commonly found in current (non-graceful) interactive systems. T o e n d this s e c t i o n , w e might note two points: first, virtually all that has b e e n s a i d a b o v e a b o u t s e c o n d - p e r s o n ability questions applies equally to many f i r s t - p e r s o n ability q u e s t i o n s ("What c a n I do?", "Can I make a reservation for 7 PM?") in which the user p l a c e s himself the active role. in In fact, the only cases in which the transformation is not applicable a r e t h o s e in w h i c h t h e u s e r w a s a recipient in the second-person version ("Can y o u tell me the n u m b e r f o r J i m Smith?"). In g e n e r a l , then, f i r s t - p e r s o n ability questions should be t r e a t e d e x a c t l y t h e s a m e as t h e c o r r e s p o n d i n g s e c o n d - p e r s o n questions. Secondly, " h o w " questions ("How c a n I m a k e a r e s e r v a t i o n ? " , or a v e n "How do I make a reservation?") can usually be t r e a t e d in t h e s a m e w a y as the c o r r e s p o n d i n g ability question without the how, i.e. as r e q u e s t s to p e r f o r m the action specified. " H o w " questions for which this transformation does not produce a r e q u e s t f u l f i l l a b l e b y the system ("How can I get there?", "How do y o u get any c u s t o m e r s at t h o s e p r i c e s ? " ) are either informational questions which should be r e c o g n i z e d s e p a r a t e l y ( s e e S e c t i o n 5.1), or are questions that a gracefully interacting simple service s y s t e m c o u l d not b e e x p e c t e d to a n s w e r satisfactorily. 5.4. Event Questions In this s e c t i o n , w e t u r n from ability questions to questions about events, c o v e r i n g e v e n t s b o t h in t h e past and in the future, and the motives behind them. q u e s t i o n s , the As in the c a s e of r e s t r i c t e d domains of forseeable gracefully interacting s y s t e m s make q u e s t i o n s much more tractable. The only event questions to which a g r a c e f u l l y ability event interacting s y s t e m c a n b e e x p e c t e d to give an informed reply are those concerning its o w n a c t i o n s a n d t h e i r r e s u l t s a n d motives, along with those actions of its users which it can o b s e r v e d i r e c t l y . Q u e s t i o n s a b o u t e v e n t s outside its direct experience will be incomprehensible. E x a m p l e s of q u e s t i o n s that should be dealt with include: W h a t d i d y o u just say? Did y o u just ask for my name? 34 W h y d o y o u want to know my name? W h y w o n ' t y o u make the reservation for 8pm? a n d u s i n g e x a m p l e s from an electronic mail domain: Has the message I cent y e s t e r d a y been delivered successfully? Will this message be d e l i v e r e d immediately? T h e f i r s t t w o examples are explicit indications of incomprehension as d i s c u s s e d in S e c t i o n 2.3. Such questions about what has just been said s e r v e a robust communication f u n c t i o n should therefore b e t r e a t e d as special cases. and Answering other event q u e s t i o n s r e q u i r e s a h i s t o r y of all the e v e n t s that have taken place in the current interaction, plus in the c a s e of t h e s y s t e m ' s actions, the reasons for them. first The SHRDLU system of W i n o g r a d [ 4 4 ] w a s to maintain s u c h a history of events and their reasons. shows, event questions can refer to events outside the As the penultimate current the example interactive session. P r e s u m a b l y , it is impractical to maintain a detailed history of all p r e v i o u s i n t e r a c t i o n s , and i n s t e a d s u f f i c i e n t to remember only "major" events, such as the d e l i v e r y of a message in t h e c a s e of an e l e c t r o n i c mail system. Questions to the system about its f u t u r e actions, as in t h e last e x a m p l e , r e q u i r e the system to "imagine" that the future has a r r i v e d under conditions and to observe the results; such questions are thus closely prescribed related to the B e f o r e l e a v i n g this section, w e should note that, like ability questions, e v e n t q u e s t i o n s can h y p o t h e t i c a l q u e s t i o n s discussed in the next section. a l s o b e i n d i r e c t s p e e c h acts. not an a c c e p t a b l e asking for For most y e s - n o questions, a bare negative a n s w e r is g e n e r a l l y r e s p o n s e , since such questions normally involve an e x p l a n a t i o n in the negative case. Thus, for "Was an indirect the speech message delivered s u c c e s s f u l l y ? " , it is unacceptable merely to reply "No!"; the reason for n o n - d e l i v e r y also be given. i n f o r m a t i o n , but Again, "Will the message be delivered immediately?" not w o u l d normally be interpreted as an indication that the user m e s s a g e to b e d e l i v e r e d immediately. act only should seeks wishes the As a third and final example, the user of a r e s t a u r a n t r e s e r v a t i o n s s e r v i c e s a y s in the middle of his conversation "Did I say there w o u l d b e e i g h t in t h e p a r t y ? " , he is p r o b a b l y not e v e n interested in the direct answer to the q u e s t i o n , but o n l y i n m a k i n g s u r e that the system knows there will be eight in the party. The recognition of s u c h s p e e c h acts is a difficult problem which has been addressed in more g e n e r a l t e r m s in t h e w o r k of C o h e n [7], L e v i n and M o o r e [25], Lehnert [24], and others. Adapting such w o r k t o t h e r e q u i r e m e n t s of graceful interaction, and using all the constraints that simple s e r v i c e d o m a i n s g i v e is an important but unsolved problem. 5.5. Hypothetical Questions B e s i d e s b e i n g able to answer questions in the context of the current situation, a g r a c e f u l l y i n t e r a c t i n g s y s t e m must also be able to deal with questions based on a h y p o t h e t i c a l c o n t e x t 35 i n v e n t e d b y the user. easier A s usual, dealing with such questions in limited domains is v e r y t h a n f o r g e n e r a l human conversation, simply because the number of v a r i a b l e s much about w h i c h h y p o t h e s e s can be made is so v e r y restricted. In t h e c a s e of simple s e r v i c e systems, the most important class of h y p o t h e s e s are those c o n c e r n i n g the value of one of the slots of the service or one of the subslots of a slot. A u s e r may ask an ability question based on the hypothesis that such a slot has a c e r t a i n v a l u e . o r e v e n that the (unspecified) value of the slot is known to the system. Examples i n c l u d e : C a n y o u g i v e me a r e s e r v a t i o n if the party-size is three? C a n y o u g i v e me the number, if I give y o u the address? U: P d like a r e s e r v a t i o n for this evening. S: I'm a f r a i d w e ' r e fully booked. U: C o u l d y o u give me a reservation for tomorrow? To answer such questions, a gracefully interacting system must first construct r e p r e s e n t a t i o n of the h y p o t h e s i z e d situation, and then evaluate and a n s w e r the q u e s t i o n . a In d o i n g t h i s , the s y s t e m must be careful not to confuse the r e p r e s e n t a t i o n of the h y p o t h e t i c a l s i t u a t i o n w i t h any other conflicting situations the user has previously s p e c i f i e d , as in t h e last example. The user may w i s h to return to his previous specification, or e v e n to introduce o t h e r s , a n d s w i t c h b e t w e e n them several times (this might be quite common for, s a y , airline r e s e r v a t i o n system). alternative switching knowledge situations back and In other words, the system must have a w a y of representing independently, of sharing common information b e t w e e n them, a n d forth between the alternatives. Such ideas have been explored r e p r e s e n t a t i o n systems such as CONNIVER [38] and NETL [10], and in w o r k H e n d r i x [ 2 1 ] , H a y e s [16], and others. an of in by The methods thus d e v e l o p e d are suitable f o r u s e w i t h t h e f r a m e s t y p e of r e p r e s e n t a t i o n p r o p o s e d in Section 4.3. Even after a representation for the hypothesis has been constructed, it is not always p o s s i b l e to treat a hypothetical question in the same w a y as its embedded q u e s t i o n w o u l d b e treated if it were posed in a context corresponding p a r t i c u l a r , the indirect s p e e c h act may be different. eight pm tonight?", spoken in a context to the situation hypothesized. In Thus "Can y o u give me a r e s e r v a t i o n f o r in which the party size has been previously e s t a b l i s h e d w o u l d normally be interpreted as a directive to make a r e s e r v a t i o n at the s t a t e d t i m e if that is p o s s i b l e . the party size On the other hand, "Can y o u give me a r e s e r v a t i o n for 8 p m t o n i g h t , if is t h r e e ? " w o u l d normally be interpreted as an e x p l o r a t i o n of possibilities, r a t h e r t h a n a r e q u e s t for action, and might well have been p r e c e d e d by a d i s c u s s i o n in w h i c h t h e p a r t y - s i z e w a s d i f f e r e n t ; it should therefore be answered literally. H o w e v e r , o n c e it h a s a n s w e r e d y e s or no, the system should not forget the hypothesis, since the user may t h e n d e c i d e t o make a r e s e r v a t i o n based on that hypothesis. 36 F i n a l l y , n o t e that h y p o t h e s e s can be made about things other than slot values, as i n : h a d c a l l e d t w o h o u r s ago w o u l d y o u have been able to give me a reservation?". s u c h a q u e s t i o n w o u l d require an ability to change the context b y r e s t o r i n g the d a t a - b a s e to its state t w o hours ago. "If I Answering reservations For a practical system, one w o u l d have to b a l a n c e t h e u t i l i t y of a n s w e r i n g s u c h questions against the expense of facilities for changing the c o n t e x t in t h e w a y r e q u i r e d . 5.6. A b i l i t y M o d e l s as a Safety Net In the preceding sections w e have discussed what is essentially a passive explanation f a c i l i t y : o n e that issues explanations in response to a user's e x p r e s s d e s i r e s ( w h e t h e r desires a r e e x p r e s s e d literally, or in an indirect speech act). those In this s e c t i o n w e d i s c u s s a m o r e a c t i v e t y p e of explanation facility: one which volunteers explanations of its o w n a b i l i t i e s in c i r c u m s t a n c e s in w h i c h the user might benefit from such explanations. We only propose o n e s u c h c i r c u m s t a n c e h e r e - w h e n the system has repeatedly failed to u n d e r s t a n d t h e u s e r a n d the c o n v e r s a t i o n is in danger of breaking d o w n completely. system there might be others, such as when the user For a more appeared to be sophisticated using a more c o m p l i c a t e d s t r a t e g y to accomplish his goals than was really necessary. We claim then that, when a gracefully interacting system has repeatedly failed to u n d e r s t a n d its user, it should use its ability model to let the user know what it c a n d o f o r him, p l u s p o s s i b l y what it needs to know to do it. There are two benefits: first, it a l l o w s t h e user t o s e e w h e t h e r the system can help him in what he wants to do; and s e c o n d l y , if t h e s y s t e m c a n h e l p , the w a y in w h i c h it states its abilities suggests to the user a p h r a s i n g for h i s r e q u e s t s that the system w o u l d be able to understand. U: What is the number for Jim Smith? S: Jim Smith's number is 2597. U: T h a n k y o u . Do y o u know how the weather is w h e r e y o u are? S: A r e y o u asking me for another number? U: N o ! I'm asking y o u about the weather. S: F m s o r r y ! All I can do is give y o u telephone numbers. I am completely i g n o r a n t of e v e r y t h i n g else. In t h i s e x a m p l e , the system cannot understand the user's questions about the w e a t h e r . The f i r s t time he asks, it asks a question based on what it expected him to say (this s t r a t e g y w a s d i s c u s s e d in s e c t i o n 2.3), but this does not produce anything more u n d e r s t a n d a b l e f r o m t h e user. T h e s y s t e m t h e n assumes that the conversation is breaking d o w n and uses its m o d e l t o i n f o r m the user of what it can do for him (and intensionally what it cannot). ability At this p o i n t , t h e u s e r is fully a w a r e of w h y the system is not cooperating more d i r e c t l y w i t h him b y f u l f i l l i n g his r e q u e s t for information about the weather, and so is much less likely to b e c o m e 37 f r u s t r a t e d t h a n if he didn't understand the reason for the problem. T h e f o l l o w i n g example s h o w s how the w a y the system phrases its statement of abilities c a n a i d t h e u s e r in formulating his request. We U: I w a n t to come and eat at your restaurant. S: I'm s o r r y ! All I c a n do is make reservations. e v e r y t h i n g else. are unusual assuming that the w o r d i n g , and that system cannot I am completely ignorant about understand the user's request because this fragment was p r e c e d e d by other unsuccessful c o m m u n i c a t i o n , so that the system has fallen back on its ability model. of the attempts at Now, if the u s e r c a r e s t o h e e d the s y s t e m ' s statement of its ability, he might try something like: " T h e n p l e a s e make me a r e s e r v a t i o n . " , w h i c h the system should be able to understand. implies that Of c o u r s e , this feature the s y s t e m can comprehend any appropriately transformed v e r s i o n of its own s t a t e m e n t of abilities (see Section 8). F i n a l l y , n o t e that this use of an ability model, while vitally important for a c o m p u t e r s y s t e m o f l i m i t e d u n d e r s t a n d i n g , has little basis in real human dialogues. humans understand conversations each other so well and have such a The r e a s o n for this is c l e a r : breadth of knowledge h a r d l y e v e r break d o w n into repeated incomprehensions; s o humans d o h a v e a n y r e a l n e e d for a last resort strategy of the kind w e have discussed. 5.7. Summary of Explanation Facilities W e c a n summarize this section as follows: - A g r a c e f u l l y interacting question should be able to answer s e v e r a l d i f f e r e n t t y p e s of q u e s t i o n from its user, including questions about its abilities, about its a c t i o n s and the motives for them, about other events in its past e x p e r i e n c e , a n d h y p o t h e t i c a l q u e s t i o n s based on all these other types. - In human dialogue, s e c o n d - p e r s o n questions about ability are i n t e r p r e t e d in o n e of t w o distinct modes: either literally or as requests to p e r f o r m the a c t i o n e m b e d d e d in them. - In g e n e r a l , a g r a c e f u l l y interacting system need not make this d i s t i n c t i o n ; s e c o n d - p e r s o n ability questions with positive answers can be t r e a t e d as r e q u e s t s for action, and those with negative answers should be a n s w e r e d n e g a t i v e l y . This d o e s not require an explicit ability model. - H o w e v e r , an explicit model is useful for the negative cases w h e n it is o n l y i n c o r r e c t p a r a m e t e r s that prevent the system from performing the e m b e d d e d a c t i o n , and for giving useful information besides a flat negative in o t h e r c o m m o n l y o c c u r r i n g negative cases. - O t h e r c a s e s in w h i c h ability models are important are " w h - " ability questions, a n d that not 38 v e r y v a g u e ability questions in which the embedded action is "do." M o s t f i r s t - p e r s o n ability questions can be treated in exactly the same w a y t h e c o r r e s p o n d i n g s e c o n d - p e r s o n questions. as T o a n s w e r e v e n t questions, including those about its o w n actions and the motives f o r them, a s y s t e m needs to maintain a history of all events that have t a k e n p l a c e in the c u r r e n t interaction, plus major events from previous i n t e r a c t i o n s , t o g e t h e r w i t h its r e p r e s e n t a t i o n of the goal structure of the c o n v e r s a t i o n at t h e time of t h o s e e v e n t s . E v e n t q u e s t i o n s , especially those of a y e s / n o type can also be indirect speech acts. A n s w e r i n g h y p o t h e t i c a l questions requires an ability to construct and s w o p b e t w e e n t e m p o r a r y contexts which represent the conditional p a r t s of t h e h y p o t h e t i c a l questions; the range of aspects of the context w h i c h c a n b e m a n i p u l a t e d in this w a y restrict the range of hypothetical questions w h i c h c a n b e dealt with. A g r a c e f u l l y interacting system can also use its model of its o w n abilities t o e x t r i c a t e itself from totally confused dialogues b y telling the user w h a t it c a n a n d cannot do for him. 6. Goals and Focus In t h i s s e c t i o n w e deal w i t h the intimately related topics of the goals of the p a r t i c i p a n t s i n a c o n v e r s a t i o n and what their attention is focused on at any given point in the c o n v e r s a t i o n . Both these items are very complicated in general t r e a t m e n t is far b e y o n d the scope of this paper. human c o n v e r s a t i o n and a thorough Instead, w e will largely c o n f i n e o u r s e l v e s t o d e a l i n g w i t h t h e s e phenomena in gracefully interacting simple service systems. We will find t h a t h i g h - l e v e l goals can be treated v e r y simply for such systems, since t h e r e are v e r y few g o a l s that the s y s t e m can recognize in the user (essentially only those to make u s e o r find o u t a b o u t the s e r v i c e s p r o v i d e d by the system, plus an undistinguished set of o t h e r s that t h e s y s t e m c a n n o t h e l p with), and since the system has no goals except to help the u s e r s a t i s f y his. S u b g o a l s that arise during the satisfaction of the top-level goals must, h o w e v e r , treated more generally in order for a system to understand what is happening in be a conversation. T h e p a r t i c i p a n t s in a conversation share a common view of what the c o n v e r s a t i o n is a b o u t , w h i c h w e call the focus of the conversation. This shared focus allows them to e c o n o m i z e in what and ellipsis. they say through anaphoric reference It is thus v e r y important for a g r a c e f u l l y i n t e r a c t i n g system to be able to follow the shifting focus of a c o n v e r s a t i o n in o r d e r t o i n t e r p r e t its users* ellipsis and anaphora. Goals and focus are clearly highly r e l a t e d in a n y c a s e , but for simple s e r v i c e systems w e find it expedient to equate them. While other 39 d e f i n i t i o n s of f o c u s have aided in the resolution of anaphora, the v i e w w e p r o p o s e aids also in t h e t r e a t m e n t of ellipsis. We will describe how this view of focus helps in the r e s o l u t i o n of e l l i p s i s a n d a n a p h o r a , and discuss strategies for keeping track of it. 6 . 1 . Goals H u m a n c o n v e r s a t i o n is v e r y goal-oriented. In other words, virtually e v e r y t h i n g a p e r s o n s a y s d u r i n g a c o n v e r s a t i o n is intended to achieve some definite aim. A n ability to d e t e r m i n e t h e g o a l s of its user is thus of prime importance for a gracefully interacting s y s t e m . In g e n e r a l varied. human c o n v e r s a t i o n , the types of goals that can be p u r s u e d are They extremely range from highly specific (trying to find someone's telephone many and number) to v a g u e (maintaining "interesting" small-talk), from quite s t r a i g h t f o r w a r d ( t r y i n g to f i n d o u t s o m e o n e ' s o r i g i n b y a direct question) to downright devious (trying to manoeuvre s o m e o n e into p r o n o u n c i n g a certain w o r d so that y o u will know his origin); f r o m c o m p l e t e l y cooperative (obtaining a number antagonistic (a cross-examination from a helpful at a trial). directory assistance Fortunately, as w e operator) shall see, a to totally gracefully i n t e r a c t i n g simple s e r v i c e system need concern itself with only a highly r e s t r i c t e d s u b s e t of t h e s e v a r i o u s t y p e s of goals. Not s u r p r i s i n g l y , the goals people have affect what they say and the w a y t h e y say it. E v e n m o r e i m p o r t a n t l y from our point of view, the w a y a listener i n t e r p r e t s what a s p e a k e r s a y s is d e p e n d e n t u p o n the listener's view of the speaker's goals. T h e w a y in w h i c h this h a p p e n s is the s u b j e c t of the linguistic theory of speech acts by Austin [1] and S e a r l e [ 3 6 J , a n d has also b e e n studied in more computationally oriented w o r k by C o h e n [7] and M a n n , M o o r e , a n d L e v i n [27]. T h e p r o b l e m s i n v o l v e d in dealing computationally with the full range of goals and s p e e c h a c t s a r e immense. In b o t h pieces of work cited, it was found necessary to maintain a m o d e l o f t h e b e l i e f s of b o t h participants in a conversation, including beliefs about the b e l i e f s o f t h e other participant. In addition, Levin and Moore [25] found it useful to r e c o g n i z e a n u m b e r of d i s t i n c t p a t t e r n s of e x p r e s s e d goals and characteristic responses; examples i n c l u d e d asking f o r i n f o r m a t i o n , r e q u e s t i n g help with a problem, and requesting a service. While the work just mentioned has explored the problem, many u n r e s o l v e d difficulties r e m a i n in the r e c o g n i t i o n and modelling of all the types and patterns of goals that c a n a r i s e in general conversation. So it is fortunate that graceful interaction, at least for simple service d o m a i n s , i n v o l v e s the recognition and modelling of only a highly r e s t r i c t e d set of goal t y p e s . F o r t h e r e m a i n d e r of the section w e will concentrate our attention on this r e s t r i c t e d p r o b l e m . 40 A gracefully interacting simple service system can make the following two sweeping s i m p l i f i c a t i o n s in its modelling of goals: - it has no i n d e p e n d e n t goals of its own; its only goal is to help the user fulfill his goals; - t h e u s e r ' s goals are either to avail himself of the system's highly limited s e r v i c e s , or fall into an undistinguished class for which the system is unable to h e l p the user. N o t e that t h e s e simplifications apply only to primary or top-level goals; l o w e r - l e v e l g o a l s o r s u b g o a l s w h i c h c a n arise during an attempt to satisfy a top-level goal must b e t r e a t e d m o r e g e n e r a l l y as w e will see in the next section. The two restrictions require little justification. The first follows from the g r a c e f u l l y i n t e r a c t i n g system w h o s e reason for existence is to s e r v e its user. notion of a Such a s y s t e m w o u l d h a v e no i n t e r e s t s of its o w n to w o r r y about, as does e v e r y human p a r t i c i p a n t in a c o n v e r s a t i o n , and so it n e e d have no goals other than those it can r e c o g n i z e in its u s e r . In a t t e m p t i n g to s a t i s f y its u s e r s ' goals, it can, of course, generate subgoals w h i c h c a u s e it t o t a k e t h e initiative in the interaction, but such initiatives are always s u b s e r v i e n t to t h e u s e r ' s t o p - l e v e l goal. T h e s e c o n d r e s t r i c t i o n is appropriate because it is futile for a gracefully i n t e r a c t i n g s y s t e m t o r e c o g n i z e goals in the user that it cannot help to fulfil. The only goals that a g r a c e f u l l y i n t e r a c t i n g s y s t e m needs to recognize in its user are those within its domain of e x p e r t i s e , p l u s p e r h a p s a f e w other closely related areas that it explicitly knows it cannot fulfil, but f o r w h i c h it c a n o f f e r some helpful advice (see Section 5.3). graceful, it can incomprehension. treat all other goals that the user Without becoming s i g n i f i c a n t l y might express with less undifferentiated F o r instance, it would be of little use for a d i r e c t o r y assistance p r o g r a m t o r e c o g n i z e that its user w a s t r y i n g to find out the state of the weather, or t r y i n g t o ask it o u t t o d i n n e r , s i n c e it w o u l d be quite unable to fulfil either of those goals of the user. N e i t h e r of t h e s e simplifications is generally true of gracefully interacting s y s t e m s t h e s i m p l e s e r v i c e class. outside A graceful supervisory or instructional program might, f o r i n s t a n c e , h a v e t o c h o o s e b e t w e e n cooperating in fulfilling a user's e x p r e s s e d goal, w h i c h it k n o w s b e a b a d g o a l to have, and c o r r e c t i n g the user. to Again, the range of goals that a u s e r might r e a s o n a b l y e x p r e s s to a secretarial system coultf be quite large. In s u m m a r y , the combination of recognizing a highly limited number of goals in t h e user, a n d making the u s e r ' s goals the system's own, allows the treatment of h i g h - l e v e l g o a l s for s i m p l e s e r v i c e domains to be extremely simple. The system need have no h i g h - l e v e l g o a l s of 41 its o w n ; it just c o o p e r a t e s with the user in fulfilling his goals. assume that M o r e o v e r , the s y s t e m can the goals of the user are either to use (and/or p e r h a p s find out a b o u t ) t h e s e r v i c e it o f f e r s , or are outside the competence of the system. Thus, g r a c e f u l i n t e r a c t i o n in a s i m p l e s e r v i c e domain does not require a sophisticated model of its o w n and the u s e r ' s g o a l s a n d m o t i v a t i o n s , s u c h as w o u l d be necessary for a more general interaction. 6.2. S u b g o a l s In Simple Service Domains W h i l e t o p - l e v e l goals can be treated in a v e r y simplified w a y by a g r a c e f u l l y simple service s y s t e m , the subgoals that can arise during fulfillment of the t o p - l e v e l must b e t r e a t e d much more generally. can be interacting nested arbitrarily As w e will see, subgoals for simple s e r v i c e goals systems d e e p l y , can originate from the system as w e l l as the u s e r , and i n v o l v e t h e a c q u i s i t i o n or imparting of specific pieces of information or d e s c r i p t i o n s . S u b g o a l s arise b e c a u s e t o p - l e v e l goals often cannot be accomplished in a single s t e p . a user may reservation be unable (time, party or unwilling to specify size, etc.) in a single all the components utterance. of, say, a Instead, he may Thus restaurant spread s p e c i f i c a t i o n s o v e r s e v e r a l different utterances; w e can say that in e a c h of those the utterances h e is p u r s u i n g the subgoals of imparting the specifications of the c o r r e s p o n d i n g c o m p o n e n t s . U n l i k e t o p - l e v e l goals, subgoals can originate from the system as well as the user. may fail to v o l u n t e e r specifications of certain parameters of the s e r v i c e , or may A user specify p a r a m e t e r s in an ambiguous or unsatisfiable w a y (see Section 7). In such c a s e s , the s y s t e m may of take the initiative by formulating a p p r o p r i a t e i n f o r m a t i o n from the user. and pursuing a subgoal trying to acquire the If the user of a restaurant r e s e r v a t i o n s s y s t e m , f o r i n s t a n c e , d o e s not v o l u n t e e r the size of his party, the system will have to ask him e x p l i c i t l y . S u c h s y s t e m g e n e r a t e d subgoals are, nevertheless, derived from and s u b o r d i n a t e t o t h e g o a l s o f t h e u s e r , and s h o u l d not be pursued blindly if the goals of the user change. T h e r e c a n b e s e v e r a l levels of subgoals; that is, subgoals can themselves h a v e subgoals. S o f a r , all o u r examples have b e e n in terms of the primary parameters of the s e r v i c e o f f e r e d b y a simple service system: subgoals on the part of the user to s p e c i f y a p a r a m e t e r t o t h e s y s t e m , a n d s u b g o a l s o n the part of the system to obtain the specification of a p a r a m e t e r f r o m the user. In terms of the frames representation d e s c r i b e d in Section 4.3, the s u b g o a l s h a v e b e e n t o fill the slots of the frame representing the service. may themselves not a l w a y s be achieved in one step. H o w e v e r , these If the system, for w h a t e v e r subgoals reason, d o e s not u n d e r s t a n d all of a description of a parameter (slot) b y the user, it c a n set u p t h e s u b g o a l s of o b t a i n i n g the missing part of the description from the user. U: I w o u l d like the number for Jim <GARBLE>. S: Jim w h o ? 42 In t h i s e x a m p l e (see Section 2.3), the user is pursuing the goal of t r y i n g to find a t e l e p h o n e n u m b e r , i n c l u d i n g the subgoal of trying to identify to the system the p e r s o n f o r w h o m wants the number. This subgoal succeeds only in part, because the system he cannot u n d e r s t a n d the surname, and so the system sets up a sub-subgoal by explicitly asking t h e user to redescribe the surname. Note that s u b o r d i n a t e to the user's original subgoal. the subgoal generated by the system Other common examples of subgoals of is subgoals o c c u r w h e n a u s e r ' s d e s c r i p t i o n of a slot is ambiguous or unsatisfiable (see S e c t i o n 7) a n d t h e s y s t e m attempts to r e s o l v e the difficulty. If a g o a l c a n have s e v e r a l subgoals which can in turn have subgoals of their o w n a n d s o o n , t r e e s of s u b g o a l s of the t y p e common in planning systems (see Hayes [ 1 7 ] and S a c e r d o t i [34], for example) can be produced. to plan an e n t i r e However, since a simple service s y s t e m d o e s not c o n v e r s a t i o n out in advance, there is no need, as is usual in need planning s y s t e m s , to g e n e r a t e complete trees of subgoals in advance (though see Section 6.4); i n d e e d , a n i n t e r a c t i o n in w h i c h the system insisted that all parameters w e r e s p e c i f i e d in a f i x e d o r d e r (and the important generated generation of such a tree would imply this) would be quite is the d e p e n d e n c y b e t w e e n subgoals and their parent goals. ungraceful. More Since s u b g o a l s in an attempt to fulfil higher-level goals, they must be a b a n d o n e d r a t h e r are than b l i n d l y p u r s u e d if the parent goals are abandoned or changed. . U: I w o u l d like the number for Mr. Smith. S: Do y o u mean Jim Smith or Joe Smith? U: I'm s o r r y , I mean F r e d Jones. Irt t h i s e x a m p l e , the s y s t e m generates a subgoal of deciding b e t w e e n t w o a l t e r n a t i v e f i l l e r s c o r r e s p o n d i n g to the ambiguous slot specification b y the user. cooperate However, the u s e r d o e s not w i t h the system's subgoal, but instead changes his o w n subgoal f r o m w h i c h the s y s t e m ' s s u b g o a l is d e r i v e d . This means that the system should now abandon its s u b g o a l a n d a t t e m p t t o fulfil the user's revised subgoal. Note that the commitment of the s y s t e m a l w a y s t o c o o p e r a t e w i t h o u t any c o r r e s p o n d i n g commitment o n the part of the user means that s u c h c h a n g e s in goals will a l w a y s originate with the user. The dependency of subgoals o n the goals that generate them is not the o n l y d e p e n d e n c y that n e e d c o n c e r n simple service systems. form Since parameters to s e r v i c e s c a n b e i n t e r d e p e n d e n t , s u b g o a l s involved in establishing one slot can be d e p e n d e n t o n a n o t h e r being filled or remaining unchanged. For example, the time, date, and p a r t y - s i z e r e s t a u r a n t r e s e r v a t i o n are mutually constraining, so that changing one may r e q u i r e another. of slot of a changing W o r k o n r e p r e s e n t a t i o n for these more general t y p e s of d e p e n d e n c i e s has been d o n e b y H a y e s [ 1 7 ] and Doyle [9]. T o e n d this s u b s e c t i o n , w e might comment on the general form of subgoals that a r i s e f o r 43 simple s e r v i c e systems. c o m m o n l y , the user T h e most important are subgoals to impart or acquire will have subgoals of imparting the specifications for information; the parameters ( s l o t s ) of the s e r v i c e , and the system will, when appropriate, generate subgoals f o r s u c h s p e c i f i c a t i o n s from the user. acquiring L o w e r - l e v e l subgoals of this t y p e c o n c e r n p a r t s of t h e s e s p e c i f i c a t i o n s , i.e. slots of frames which fill the slots in the main frame for the s e r v i c e (e.g. t h e s u r n a m e f o r a name). Subgoals can also s p e c i f y i n g a slot. concern the acquisition of information relevant and prerequisite to Thus, prior to specifying the time of a restaurant r e s e r v a t i o n , a u s e r might a s k w h a t time the restaurant closed. If the user was asking the question b e c a u s e he w a n t e d a r e s e r v a t i o n as late as possible, this question would represent a subgoal of the s u b g o a l of s p e c i f y i n g the time. different Since an informational question could also signal a s w i t c h to a c o m p l e t e l y s u b g o a l , k n o w l e d g e of what information was relevant to w h i c h slot specifications w o u l d b e u s e f u l to the s y s t e m to enable it to keep track of which subgoals are c u r r e n t . S: W h a t time w o u l d y o u like your reservation? U: W h a t time do y o u close? In this example probably it would be desirable for the system to appreciate that the user was c o o p e r a t i n g w i t h the subgoal it had generated, rather than i n t r o d u c i n g a d i s t i n c t s u b g o a l of his o w n . H o w e v e r , in general, arbitrary amounts of knowledge may b e r e q u i r e d t o m a k e this d e t e r m i n a t i o n ; the user might, for instance, ask how far the restaurant w a s t h e O p e r a House in t r y i n g to specify the time of his reservation. from T h e best s t r a t e g y may b e t o r e c o g n i z e common examples of this type of subgoal as special cases, and a n s w e r all o t h e r informational questions goal or subgoal. w i t h the presumption that they represent subgoals of the current Note that from this perspective, general questions about the s y s t e m ' s a b i l i t y c a n b e v i e w e d as subgoals of the user's top-level goal to make use of the system's s e r v i c e s . O t h e r m e t h o d s of k e e p i n g track of goals and subgoals are discussed in Section 6.4. In s u m m a r y , s u b g o a l s can arise for both the user and the system and t h e y c a n b e s e v e r a l levels deep. nested T h e system should always cooperate with the subgoals e s t a b l i s h e d by t h e u s e r , but the user cannot be expected always to cooperate with subgoals e s t a b l i s h e d b y t h e s y s t e m ; he may p u r s u e other subgoals independently, or change p r e v i o u s l y g o a l s o r s u b g o a l s o n w h i c h the system's subgoal depended. established In either case, the s y s t e m s h o u l d f o l l o w t h e u s e r ' s initiative rather than doggedly pursuing its o w n subgoals. T h e most c o m m o n f o r m of s u b g o a l c o n c e r n s the specification of a parameter of the service or of a s u b - s l o t of s u c h a parameter. specification. A n o t h e r common type seeks information useful for e s t a b l i s h i n g s u c h a 44 6.3. F o c u s When people engage s p e c i f i c t o p i c or f o c u s . in conversation, their attention is generally directed to a highly This focus of attention is typically the same for all the p a r t i c i p a n t s , g i v i n g r i s e to the impression that they are "talking about the same thing". A s h a r e d f o c u s of a t t e n t i o n a l l o w s dialogue participants to economize substantially o n what t h e y s a y through u s e of e l l i p s i s and anaphora; they assume listeners can fill in the missing information b y u s e o f t h e c o m m o n focus. Since people make such economies of e x p r e s s i o n v e r y f r e q u e n t l y and n a t u r a l l y , a n y g r a c e f u l l y interacting system must be able to keep track of the s h i f t i n g f o c u s of its c o n v e r s a t i o n and use it to resolve the ellipsis and anaphora of its user (as w e l l p e r h a p s g e n e r a t i n g its own). are the topic of the next as The problems involved in keeping track of c o n v e r s a t i o n a l f o c u s section; in the remainder of this section, w e will provide an e x p e d i e n t d e f i n i t i o n of focus for simple service domains, and show h o w s u c h a f o c u s c a n h e l p r e s o l v e a n a p h o r a and ellipsis. A d e f i n i t i o n of f o c u s that is both precise and general is hard to formulate. Even though the n o t i o n of f o c u s (or topic) has b e e n used in computational systems b y G r o s z [ 1 4 ] and S i d n e r [ 3 7 ] as w e l l as in more traditional linguistic work, including some b y Grimes [ 1 3 ] and H o c k e t t [ 2 2 ] , it h a s b e e n v i e w e d in different ways according to the r e s e a r c h goals i n v o l v e d . for instance, modelled dialogues in which an expert guided an apprentice Grosz, through m e c h a n i c a l a s s e m b l y task, and found it convenient to equate the focus of the d i a l o g u e the set of discussion. objects involved in the assembly step currently being undertaken or a with under Sidner, o n the other hand, dealt with monologues s p e c i f y i n g a r r a n g e m e n t s f o r m e e t i n g , a n d s o took focus to be a complete single entity: s o m e p a r a m e t e r of it, such as its time or place. either a meeting as a w h o l e a or In both cases, the entity o r e n t i t i e s in f o c u s a r e u s e d t o h e l p r e s o l v e anaphoric references by providing a set of potential r e f e r e n t s for such references. Nor instead will w e we will attempt again to define focus in general, or e v e n for all of g r a c e f u l confine ourselves to simple service systems. For interaction; such systems, h o w e v e r , w e p r o p o s e an extension of the notion of focus by equating it w i t h t h e c u r r e n t l y active subgoal. This v i e w of focus subsumes the notion of a collection of entities t o w h i c h a t t e n t i o n is c u r r e n t l y d i r e c t e d , since the current subgoal also defines a set of e n t i t i e s w h i c h a r e c u r r e n t l y r e c e i v i n g attention - those involved in the subgoal. This set can b e u s e d in t h e s a m e w a y as just mentioned to provide potential referents for anaphora. U s i n g t h e c u r r e n t subgoal as the focus also, however, helps in the r e s o l u t i o n of ellipsis. T h e e l l i p s e d information can o f t e n be filled in, via the assumption that the elliptical u t t e r a n c e i s i n t e n d e d to fulfil the current subgoal, thus providing a complete i n t e r p r e t a t i o n f o r the 45 elliptical utterance. For instance, an answer of "X" in response to a question of "Do y o u m e a n X o r Y ? " c o u l d b e s e e n as fulfilling the subgoal set up by the question of c h o o s i n g b e t w e e n X a n d Y. I n t e r p r e t e d in the light of such a focused subgoal, the elliptical u t t e r a n c e , "X", has t h e c o h e r e n t i n t e r p r e t a t i o n of choosing X instead of Y. In o t h e r w o r d s , this e x t e n d e d notion of focus is useful in the same w a y as the m o r e u s u a l o n e in t h e r e s o l u t i o n of anaphora, but unlike the other one, can also be used in the r e s o l u t i o n of ellipsis. T h i s latter use of focus appears to be novel, but see also the w o r k of L e v i n a n d M o o r e [25]. potential Of c o u r s e , it could be argued that the two aspects of focus - as a c o l l e c t i o n of anaphoric information - are referents, separate and as a goal and should not with the potential for providing be dealt with in related w a y s . ellipsed We believe, h o w e v e r , that the t w o aspects are sufficiently intertwined for it to b e p r o f i t a b l e t o treat t h e m t h r o u g h a common mechanism. A t this point, some c o n c r e t e examples will make it clearer how focus, as w e a r e proposing it, c a n h e l p w i t h a n a p h o r a and ellipsis. U: T d like the number for Mr. Smith. S: Do y o u mean Bill Smith or George Smith? U: I mean Bill. H e r e t h e n e w u s e r r e f e r s anaphorically to Bill Smith by use of the first name. T h i s r e f e r e n c e is e a s y t o r e s o l v e , h o w e v e r , because the focus is o n the user g e n e r a t e d s u b g o a l of filling t h e p e r s o n slot of the d i r e c t o r y assistance frame, and more particularly o n the s y s t e m g e n e r a t e d s u b g o a l of g e t t i n g the user to distinguish b e t w e e n two alternatives; this s u b g o a l i n v o l v e s t w o e n t i t i e s a n d the r e f e r e n c e can easily be resolved to one of them. A similar r e s o l u t i o n c o u l d b e m a d e if t h e user had, for instance, said, "I mean the first one." Note that this depends resolution u p o n the fact that the user's utterance does not change the c u r r e n t s u b g o a l and h e n c e t h e c u r r e n t focus. If w e further change the example so that the user's r e p l y is j u s t "Bill", or this ellipsis can easily be resolved by assuming lhat the u s e r "The first one K c o o p e r a t i n g in the f o c u s e d subgoal of choosing b e t w e e n the alternatives, so that the m e n t i o n e d actually r e p r e s e n t s the user's choice. is object Note that there is no n e e d to r e c o n s t r u c t a p h r a s e s u c h as "I mean Bill", (or "I think it's Bill, "Bill is the one I want the number f o r " , etc.); it is e n o u g h to i n t e r p r e t the referent as fulfilling the system's subgoal of g e t t i n g the u s e r t o i n d i c a t e a c h o i c e (see Section 3.2). C o n s i d e r also the f o l l o w i n g extract from the extended example of Section 9.2, in w h i c h t h e t i m e o f a r e s t a u r a n t r e s e r v a t i o n is under discussion. 46 S: I can't give y o u eight o'clock; it would have to be s e v e n or a f t e r nine thirty. U: Y o u don't have anything b e t w e e n those times? S: N o ! I'm afraid w e don't. U: W e l l , the later time then. The system's " i t " in the first line r e f e r s to the time of the r e s e r v a t i o n r a t h e r o ' c l o c k ; this c o r r e s p o n d s to the focus of filling the time slot. the g o a l of than The system's u t t e r a n c e eight refines filling this slot into a subgoal involving a choice on the part of the u s e r , and " t h o s e t i m e s " in the user's r e p l y has to be interpreted in this context, since in p a r t i c u l a r , t h e referenced set of times does not include eight o'clock. The user's "anything" is less s t r a i g h t f o r w a r d , h o w e v e r ; if a referent had to be found for it, a non-specific r e f e r e n c e t o a t i m e w o u l d p r o b a b l y be best, but it is probably preferable to interpret the u t t e r a n c e as a whole as a s k i n g the s y s t e m to confirm its restriction o n times, without t r y i n g to interpret " a n y t h i n g " individually. It thus sets up a subgoal on the part of the user, in terms of w h i c h the reply system's elliptical is easy to interpret. However, the focus on the choice e s t a b l i s h e d in the system's first utterance is still available, and must be u s e d to i n t e r p r e t t h e e l l i p s i s a n d a n a p h o r a in the user's second utterance. The i m p o r t a n c e of focus, then, is related to the commonness of a n a p h o r a a n d e l l i p s i s n a t u r a l l a n g u a g e use, and they are v e r y common indeed. in Both allow a s p e a k e r t o e c o n o m i z e o n w h a t he s a y s b y omitting unnecessary information that his listener can fill in b e c a u s e t h e i r a t t e n t i o n is f o c u s e d o n the same common ground. Since the user of a g r a c e f u l l y interacting s y s t e m w i l l n a t u r a l l y economize in this way, it is vital for the system to b e able t o u s e its a w a r e n e s s of the conversational focus to make good the omitted information. In summary, for simple service systems it is convenient to equate the focus of the c o n v e r s a t i o n w i t h the subgoal that the user and the system are c u r r e n t l y c o o p e r a t i n g o n . A n a p h o r i c r e f e r e n c e s can o f t e n be resolved into objects involved in the s u b g o a l c u r r e n t l y in f o c u s , a n d e l l i p s i s c a n b e r e s o l v e d by assuming that the elliptical utterance is f u l f i l l i n g the f o c u s e d subgoal. 6.4. Keeping track of focus S i n c e f o c u s is so important in the interpretation of ellipsis and anaphora, and s i n c e t h e s e p h e n o m e n a a r e so commonplace, it is v e r y important for a gracefully interacting s y s t e m k e e p t r a c k of the f o c u s as it shifts during the course of a conversation. to It is e a s y e n o u g h f o r a s y s t e m t o tell w h e n the focus changes as a result of some n e w subgoal it e s t a b l i s h e s , b u t w h a t a b o u t w h e n the user changes subgoals and therefore focus? T h e p r o b l e m is a v e r y difficult one in general, and e v e n for simple s e r v i c e s y s t e m s , w e c a n 47 o n f y s u g g e s t a d i r e c t i o n for experimentation. focus The main problems are the u n p r e d i c t a b i l i t y s h i f t s in terms of both w h e n they occur and what the new focus will b e . As Sidner [ 3 7 ] o b s e r v e s , f o c u s shifts cannot be predicted in advance, they can only be d e t e c t e d t h e y occur. after a of after In the remainder of this section, w e will investigate what the n e w f o c u s c a n b e shift (the answer will be virtually anything), and suggest some approaches to d e t e c t i n g s u c h shifts. A v e r y c o m m o n p a t t e r n of focus shift arises w h e n goals are r e f i n e d into s u b g o a l s , w h i c h in t u r n s p a w n s u b g o a l s of their o w n . For example, in: U: W h a t is the number for Bill Smith? S: Did y o u s a y Bill or Phil? U: Bill w i t h a ' B \ S: ' B ' . O.K. T h e number is 2597. the user first focuses o n the p e r s o n parameter (slot) of the d i r e c t o r y task (frame), t h e s y s t e m t h e n c h a n g e s the focus to the first name of the p e r s o n and in particular to a c h o i c e b e t w e e n t w o a l t e r n a t i v e s for that slot. The user then cooperates w i t h the s y s t e m ' s g o a l a n d c h o o s e s o n e of the alternatives, but in doing so refines the focus yet again to the f i r s t l e t t e r of t h e f i r s t name. In its r e p l y , the system first maintains the focus b y a c k n o w l e d g i n g the l e t t e r , a n d t h e n shifts the focus drastically by providing the user with the number he d e s i r e d . T h e last f o c u s shift can be thought of as popping the nested set of foci that had b e e n built up t h r o u g h the p r e c e d i n g utterances, and then moving on to the next task - in this performing through the r e q u e s t e d service since its parameters are now complete. Popping case back all the n e s t e d levels and going on to the next major subgoal is p r o b a b l y t h e most c o m m o n w a y of terminating such nesting, but partial popping can also occur: U: W h a t is the number for Mr. Smith? S: Do y o u mean Bill Smith or Joe Smith? U: I mean Jim Smythe. H e r e t h e u s e r d o e s not c o o p e r a t e with the system's subgoal of distinguishing b e t w e e n two a l t e r n a t i v e s , but instead p o p s back to the previous level and retries his goal of s p e c i f y i n g t h e p e r s o n slot to the s y s t e m . Focus user can s h i f t s d o not, h o w e v e r , always follow this nice stack discipline. at any time reopen a focus that was apparently For i n s t a n c e , t h e closed, either as a result s o m e t h i n g the s y s t e m tells him which might make such a shift more p r e d i c t a b l e , as in: U: I w o u l d like a r e s e r v a t i o n for two tonight. S: W h a t time w o u l d that be for? U: A b o u t s e v e n . S: I'm a f r a i d the earliest I could give y o u is eight thirty. U: W h a t about t o m o r r o w night? of 48 o r f o r r e a s o n s b e s t k n o w n to himself: U: I w o u l d like a r e s e r v a t i o n for a party of six tonight. S: What time w o u l d that be for? U: N o ! T h e r e will be s e v e n of us. Even worse, the user can shift the focus to subgoals of previously open foci, which t h e m s e l v e s h a v e n e v e r b e e n open, as in: U: What is the number for Bill Smith? S: T h e number for Bill Smith is 2597. U: N o ! I s a i d Phil w i t h a *P* as in parrot. w h e r e t h e u s e r not only r e o p e n s the focus on specifying the p e r s o n , but jumps immediately i n t o s p e c i f y i n g the first name slot. Note that the specification for the surname remains the s a m e , w h i c h is a d d e d e v i d e n c e that the old focus of filling the p e r s o n slot is r e o p e n e d and then refined. T h e s e e x a m p l e s suggest that at any time there is a tree of potential f o c i , Le, subgoals w h i c h may b e c o m e the current focus; for a simple service system, the root of the t r e e w o u l d b e t h e t o p - l e v e l goal of getting the service performed, the next level w o u l d b e the g o a l s of s p e c i f y i n g the main parameters or slots of the service, the next level, s p e c i f y i n g t h e i r etc slots, T h e s t a c k - l i k e p a t t e r n of focus shifts, described above as the most common t y p e , t h e n c o r r e s p o n d s to d e p t h - f i r s t shifts d o w n one branch of the tree, f o l l o w e d b y a r e t u r n u p t h e branch and a transfer to the highest node of another branch. Reopening of old foci c o r r e s p o n d s to visiting a node in the tree more than once, and shifting f o c u s to a- s u b g o a l of a p r e v i o u s l y o p e n focus, w h i c h itself has never been o p e n , c o r r e s p o n d s to visiting f o r the f i r s t time a n o d e w h o s e parent has been visited before. In t h e examples above, the shifts across branches always involved shifts to completely d i f f e r e n t b r a n c h e s , since they involved changes in which top-level parameter of the was being considered. service Shifts can also occur across branches which meet b e l o w t h e r o o t of t h e t r e e , as the f o l l o w i n g example shows. U: What is the number for Mr. Smith? S: Do y o u mean Joe Smith or F r e d Smith? U: I mean the Smith o n the eighth floor. H e r e t h e u s e r d o e s not c o o p e r a t e with the system's subgoal of distinguishing b e t w e e n the S m i t h s b y f i r s t name, but instead chooses to distinguish them b y location, a s i b l i n g s u b g o a l o f t h e g o a l of filling the p e r s o n slot. F i n a l l y , a u s e r c a n always completely abandon his top-level goal, and thus move t h e f o c u s e n t i r e l y out of the t r e e , as in: 49 U: I w o u l d like a r e s e r v a t i o n for two this evening. S: It w o u l d have to be after ten. U: I'll t r y s o m e w h e r e else. N o t e a l s o that in this example the user covers two subgoals in his first utterance; the s y s t e m must b e p r e p a r e d to deal w i t h problems in one of them without forgetting the o t h e r . T h e net c o n c l u s i o n from all these examples is that while there are c e r t a i n common t y p e s o f focus shift, which follow a stack discipline corresponding to progress through the s p e c i f i c a t i o n of the s e r v i c e and the several levels of detail that can be i n v o l v e d , f o c u s shift to virtually can any part of the tree of potential foci, including shifts to o l d , a p p a r e n t l y c l o s e d , f o c i o r t o s u b p a r t s of such old foci that have never themselves b e e n o p e n . P r e v i o u s w o r k o n k e e p i n g track of focus has been based on more r e s t r i c t i v e assumptions. M o d u l o t h e i r d i f f e r e n t v i e w s of focus, both Grosz and Sidner have assumed that f o c u s s h i f t s i n a q u i t e o r d e r l y w a y t h r o u g h a tree of potential foci; in particular, t h e y assumed that no n o d e o r b r a n c h of the t r e e was visited more than once, and in some cases, that the b r a n c h e s o f t h e t r e e w e r e t r a v e r s e d in a partially fixed order. T h e y then d e t e c t e d f o c u s shift from m e n t i o n of e n t i t i e s that belong in foci either further d o w n or further up the same b r a n c h of t h e t r e e as the c u r r e n t focus, or in a focus on one of the permissible next b r a n c h e s . have We n o immediate solution for the detection of more general t y p e s of focus shift; a g o o d s t a r t i n g point might be, in the same way as Grosz and Sidner, to s e a r c h for a p p r o p r i a t e foci w h e n t h e e n t i t i e s or goals mentioned do not fit in the current focus, but to s e a r c h o v e r the e n t i r e t r e e of p o s s i b l e foci for the task domain. In addition, since old foci can b e reopened a n d s p e c i f i c o b j e c t s associated with them reused, the relevant information must b e a s s o c i a t e d w i t h t h e n o d e s that have already b e e n traversed. 6.5. Summary of Goals and Focus W e c a n summarize the main points made in this section as follows: - O r d i n a r y human c o n v e r s a t i o n is highly goal oriented; people almost a l w a y s p u r s u i n g o n e or more goals in saying what they say. are - A g r a c e f u l l y interacting simple service system need have no high-level goals of its o w n ; it merely c o o p e r a t e s with the user in fulfilling his. - T h e o n l y goals such a system need recognize in its user are those it c a n h e l p s a t i s f y ; this p r e c l u d e s the need for a sophisticated model of t o p - l e v e l goals. - S u b g o a l s must be treated with greater care; they can be n e s t e d a r b i t r a r i l y d e e p l y and originate from both the system and the user, but the s y s t e m g e n e r a t e d s u b g o a l s are always derived from and subservient to t h o s e of t h e user. 50 C o m m o n t y p e s of subgoal concern the specification of parameters of the s e r v i c e , o r o n e of their subslots; others seek information useful for establishing s u c h specifications. P e o p l e use a s h a r e d v i e w of what a conversation is about, called the f o c u s of t h e c o n v e r s a t i o n , to economize substantially on what they say t h r o u g h a n a p h o r a a n d ellipsis. F o r a simple s e r v i c e system, viewing focus as the currently active s u b g o a l h e l p s in t h e r e s o l u t i o n of b o t h anaphora and ellipsis. A l t h o u g h t h e r e are some common patterns of focus shifts, the f o c u s of a c o n v e r s a t i o n has the potential to shift unpredictably to any of a t r e e of p o t e n t i a l foci. 7. Identification from Descriptions 7.1. The Identification Problem A fundamental component of human conversation is the ability of a listener to use a s p e a k e r ' s d e s c r i p t i o n of a previously memorized entity (object or event) to i d e n t i f y and r e c a l l the entity. If a definite focus of attention has already been established in the c o n v e r s a t i o n , t h e n as w e s a w in Section 6.3, descriptions can be quite cryptic or anaphoric (the w o m a n ; t h e s e c o n d o n e ; it). On the other hand, if no context has been established, or if the o b j e c t t o b e d e s c r i b e d is not in the established context, much fuller forms of descriptions must b e (the camping trip w e went used on last Easter; a w o r d beginning with ' 0 ' meaning f a w n i n g s e r v i l e ; t h e lady that I s a w y o u with last night). or One of the more remarkable f e a t u r e s h u m a n m e m o r y is the ability to identify from a suitable description virtually any of previously e x p e r i e n c e d or k^own object or event (that man we met at the s e c o n d hotel w e s t a y e d at w h e n w e w e n t o n vacation last year). W h e n he d e s c r i b e s an entity, a speaker aims to identify the entity uniquely to his l i s t e n e r . To do this, he uses his beliefs about the listener's state of knowledge to construct a d e s c r i p t i o n that is informative enough to identify the object uniquely to the listener ( p o s s i b l y w i t h r e s p e c t to a c u r r e n t context). Most usually he succeeds, but sometimes a l i s t e n e r will r e c a l l m o r e t h a n o n e entity fitting the description, or may not be able to i d e n t i f y a n y e n t i t y filling the description. In other words, there are three possible outcomes to any attempt t o i d e n t i f y an e n t i t y t h r o u g h a description: u n i q u e - o n l y one entity fits the description a m b i g u o u s - more than one entity fits the description u n s a t i s f i a b l e - no e n t i t y fits the description 51 In a c c o r d a n c e w i t h the Principle of Implicit Confirmation (Section 2.2), a s p e a k e r that assumes his communication is successful, and in particular that his d e s c r i p t i o n s i d e n t i f y e n t i t i e s f o r his listener, unless the listener indicates otherwise. unique The listener thus n e e d do n o t h i n g if a d e s c r i p t i o n falls into the unique case, but must indicate a p r o b l e m to the o r i g i n a l speaker will in the ambiguous or unsatisfiable cases. usually In indicating such a p r o b l e m , the listener initiate a dialogue during which the description is r e v i s e d to s p e c i f y a u n i q u e e n t i t y t o the listener. Identification from descriptions is clearly very important for a gracefully interactions s y s t e m w h i c h must identify entities from its user's descriptions, and must in t u r n t h i n g s to its u s e r . describe A d i r e c t o r y assistance must be able to accept d e s c r i p t i o n s of its l i s t i n g s , a n d in t u r n d e s c r i b e the c o r r e s p o n d i n g numbers to its user; and a restaurant reservation s y s t e m must b e able to accept and give out descriptions of times, dates, p a r t y s i z e s , etc.. F o r t u n a t e l y , identification from descriptions for gracefully interacting s y s t e m s , d o e s not i n v o l v e the q u i t e a w e - i n s p i r i n g range of description and recall found in human c o n v e r s a t i o n . In s i m p l e s e r v i c e systems, in particular, entities that the system must r e c o g n i z e from the u s e r ' s d e s c r i p t i o n s are essentially the entities that serve as parameters to the s e r v i c e , a n d these typically fall into a few highly restricted categories such as listings for directory a s s i s t a n c e , times, dates, p a r t y sizes, and names for restaurant r e s e r v a t i o n s , and m a i l b o x e s , h o s t s , a n d m e s s a g e s for electronic mail systems. O n c e a d e s c r i p t i o n has b e e n obtained from a user, identification as an e n t i t y k n o w n to t h e system member is also r e l a t i v e l y simple; since each parameter of a finite class k n o w n to the system. to the service t y p i c a l l y must be a A directory assistance s y s t e m , f o r e x a m p l e , w o u l d k n o w the set of all listings that it could consult, and a restaurant r e s e r v a t i o n s s y s t e m , the d a t e , time, and p a r t y size must be a member of a finite set. T h e name in w h i c h a r e s e r v a t i o n is made is an exception to this rule. E v e n t h o u g h the range of entities that can be described, and the method of identifying e n t i t i e s f r o m their d e s c r i p t i o n s is much simpler than in general human c o n v e r s a t i o n , t h e results of attempting identification from descriptions can be classified into the same categories: unique, ambiguous, and unsatisfiable. A gracefully interacting end three system must, t h e r e f o r e , b e able to participate o n either side of a dialogue for c l a r i f y i n g u n s a t i s f i a b l e a m b i g u o u s d e s c r i p t i o n s ; these dialogues will be the subject of the following s u b s e c t i o n s . or Our e x a m p l e s and analysis in these subsections will be in terms of simple s e r v i c e domains, a n d n o claims are protocols made of beyond this, Many of the strategies presented are, h o w e v e r , b a s e d human conversations, and have much relevance to broader c l a s s e s of communication. We know of no other work that has attempted to analyse this on human type of 52 clarification dialogue. 7.2. A m b i g u o u s Descriptions When a speaker interpret describes an entity to a listener in such a w a y that the l i s t e n e r can the d e s c r i p t i o n as r e f e r r i n g to more than one entity, the d e s c r i p t i o n is s a i d t o b e ambiguous. In o r d e r for the speaker's attempt at description to s u c c e e d in s u c h a s i t u a t i o n , a n c l a r i f y i n g dialogue must ensue to resolve the ambiguity. strategies involved in s u c h In this section w e i n v e s t i g a t e t h e a resolution, restricting ourselves to the case in which the T h e s i m p l e s t s t r a t e g y for ambiguous descriptions is to inform the user of the o p t i o n s and d e s c r i p t i o n s are g i v e n to a gracefully interacting simple service system b y its user. a s k him t o make a d e c i s i o n b e t w e e n them as in: U: What is the number for Smith? S: Do y o u mean Jim Smith or Joe Smith? N o t e that in distinguishing the options by the first name of the p e r s o n , the s y s t e m r e f i n e s t h e . f o c u s of t h e c o n v e r s a t i o n to the corresponding slot in the description of a p e r s o n . S o m e t i m e s the alternatives may be too many to list and in this case the s y s t e m c a n fix o n a d i s t i n g u i s h i n g f e a t u r e of the alternatives and ask the user about it. U: What is the number for Smith? S: T h e r e are 5 9 Smiths listed. Do y o u know the first name? W h i l e u n a v o i d a b l e in this case, this strategy has the disadvantage of inviting a w i d e r o f r e p l y t h a n the s t r a t e g y of explicitly listing the alternatives. variety In g e n e r a l , some m e t h o d f o r w e i g h i n g the r e l a t i v e advantages of the various strategies of clarification a c c o r d i n g t o p r e v a i l i n g c i r c u m s t a n c e s may be needed. distinguishing the Both strategies, however, change the f o c u s t o b e a aspect (slot) of the entities fitting the original ambiguous d e s c r i p t i o n . This f o c u s s h i f t s e r v e s as the basis for understanding a continuation of: U: I think it starts w i t h a P ... H M t o t h e a b o v e example. T h e r e is, of c o u r s e , no guarantee that the user will choose to follow the s y s t e m ' s shift of focus. He may i n s t e a d t r y to resolve the ambiguity by giving distinguishing i n f o r m a t i o n of his o w n c h o o s i n g ; in particular, b y further specifying an aspect of the entity d i f f e r e n t f r o m t h e o n e c h o s e n b y the s y s t e m . U: I don't k n o w the first name, but the address is Willow Crescent. T h i s c o r r e s p o n d s to a shift of focus to a sibling node in the tree of p o s s i b l e f o c i . In a n y c a s e , t h e s y s t e m must be able to take whatever distinguishing information the u s e r g i v e s it 53 a n d a p p l y it to the r e s o l u t i o n of the ambiguity. If the ambiguity is still not r e s o l v e d , y e t m o r e i n t e r a c t i o n must take place w i t h the user. Instead of trying to resolve the ambiguity, the user may also change the original description, giving U: as a No!! I meant Smythe. possible c o m p l e t e l y , or continuation reiterate the for the last example, abandon ambiguous description. the These cases attempt at identification must all b e recognized for ambiguous s e p a r a t e l y , a n d a p p r o p r i a t e responses given. Occasionally, for domain dependent reasons, one of the d e s c r i p t i o n may be much more likely than any of the others. alternatives an In such cases, the a l t e r n a t i v e m a y b e o f f e r e d as a y e s / n o option to the user without r e f e r r i n g to the o t h e r s . Suppose r e s e r v a t i o n s a r e available only after 9:30: U: P d like a r e s e r v a t i o n for two tonight at 8:30 or later. S: W o u l d 9:30 b e OK? E v e n t h o u g h the user's specification c o v e r e d 8:30 and all later times, 9:30 is the time t h o s e a v a i l a b l e that the user is most likely to prefer. of A similar s t r a t e g y can b e f o l l o w e d if t h e u s e r is u n l i k e l y to be c o n c e r n e d about which of the alternatives is c h o s e n . There may also be r e f e r e n t o v e r another. chose an system, Again, alternative for user dependent reasons for of one alternative If the system has previous experience of a particular user, it might w h i c h it knows the user normally p r e f e r s . instance, might have knowledge of knowledge the system to p r e f e r the current b e t w e e n t w o alternatives. goals of the time a client a user might provide A restaurant usually reservation prefers some clue to to his dine. choice If a restaurant system knew, for instance, that the user w i s h e d t o d i n e a f t e r the t h e a t r e , it might select an alternative reservation time that c o n f o r m e d t o that constraint. W e have not, h o w e v e r , considered the significant problems i n v o l v e d in u s i n g a n d a c q u i r i n g i d i o s y n c r a t i c information, or in modelling and using an o p e n - e n d e d set of r e a l - w o r l d goals. 7.3. Unsatisfiable Descriptions In this s e c t i o n w e consider descriptions from which a listener cannot identify any e n t i t y h e knows about. unsatisfiability description. The as basic well as human strategy indicating any in such cases near-misses, seems i.e. entities to be that to indicate "almost" fit the the A s b e f o r e w e will discuss unsatisfiable descriptions assuming that t h e l i s t e n e r is a g r a c e f u l l y i n t e r a c t i n g simple service system and the speaker is its user. N l V l R S CARNlGK-MtUOM « "* PITTSBURGH. PENNSYLVANIA 15211; 54 When a user's description is unsatisfiable by any entity consistent with the system's c o n s t r a i n t s , the basic s t r a t e g y must be to inform the user of the problem and e x p e c t him t o e i t h e r c h a n g e the d e s c r i p t i o n so that it is satisfiable or give up. should provide unsatisfiable. In doing this, the system the user w i t h as complete a picture as possible of w h y the d e s c r i p t i o n This information will help the user is in reformulating his d e s c r i p t i o n , a n d , i n p a r t i c u l a r , h e l p him avoid giving another description that is faulty. U: What is the number for Mr. Jim Smith? S: I'm s o r r y ! T h e r e is no listing for any Smith with initial 7 . U: S: I'd like a r e s e r v a t i o n for ten people tonight. I'm s o r r y ! w e can't accommodate parties larger than seven.. U: S: I'd like a r e s e r v a t i o n for two for tonight. I'm a f r a i d the earliest reservation is next Thursday. Implicit which in t h e s e explanations of unsatisfiability is the search for nearly fit the description. It is considered helpful for "near-misses", humans n e a r - m i s s e s , a n d a g r a c e f u l l y interacting system should do the same. to entities suggest Of c o u r s e , a n y such metric f o r m e a s u r i n g w h e t h e r an entity is a near-miss must be quite domain d e p e n d e n t . The metric may goals also be dependent distantly related to on the idiosyncracies of a particular the domain of user, or o n user the system, in much the same w a y as the only automatic s e l e c t i o n b e t w e e n alternative possibilites for ambiguous descriptions d i s c u s s e d at t h e e n d o f t h e last s e c t i o n . A s also mentioned there, w e have not attempted to address the f o r m i d a b l e p r o b l e m s that t h e s e topics raise. Near-misses distant may near-misses). be either They unique (one near-miss) or can be treated similarly ambiguous to the basic (several unique and comparably ambiguous d e s c r i p t i o n c a s e s , e x c e p t that it must be made plain that the d e s c r i p t i o n w a s s a t i s f i e d b y n o e n t i t y as it s t o o d . In addition, the unique case must be p r e s e n t e d in a y e s / n o q u e s t i o n t o t h e u s e r r a t h e r t h a n assumed. U: W h a t is the number for Joe Smith? S: W e don't have a listing for Joe Smith. Do y o u mean Jim Smith? U: What is the number for Joe Smith? S: W e don't have a listing for Joe Smith. There's one for Jim Smith or P e t e Smith or one for Joe Smythe. In a d d i t i o n to the r e p l i e s to the basic ambiguous case discussed above, the n e a r - m i s s case c a n r e s u l t in r e p l i e s in w h i c h the user claims that the near-miss was what he s a i d in the f i r s t p l a c e , o r in w h i c h he r e p e a t s (possibly in the negative, "but there isn't any J o e Smith?") t h e original description. 55 7.4. D e s c r i p t i o n s and Faulty Comprehension In d i s c u s s i n g the r e p l i e s that a gracefully interacting system should make to a m b i g u o u s o r u n s a t i s f i a b l e d e s c r i p t i o n s , w e have assumed that the system c o m p r e h e n d e d the b o t h fully and accurately. descriptions A partially or uncertainly comprehended d e s c r i p t i o n c a n , of c o u r s e , b e c l a s s i f i e d in the same w a y as unique, ambiguous, or unsatisfiable, but the w a y in w h i c h t h e system reacts to comprehension. such We descriptions can distinguish should the be two modified in subcases of the presence incomplete of and faulty uncertain comprehension. If t h e s y s t e m ' s c o m p r e h e n s i o n of a description is incomplete, so that some a s p e c t s of description ought not are to c o m p r e h e n d e d , while others are not, then the system's be modified depending on the classification of the reaction ought the d e s c r i p t i o n . If or such d e s c r i p t i o n is ambiguous and the missing part of the description might p o s s i b l y r e s o l v e a the a m b i g u i t y , t h e n the s y s t e m should assume that it would, and ask the user a s u i t a b l e c l a r i f y i n g question. U: W h a t is the number for <GARBLE> Smith? S: Did y o u s a y Jim Smith or Joe Smith? If, o n t h e o t h e r hand, the uncomprehended part of the description could not d i s t i n g u i s h t h e two a l t e r n a t i v e s , t h e n the description should be treated in the same w a y ambiguous description. as an ordinary F o r example, if there is a Jim Smith and a J o e Smith, b o t h w i t h t h e s a m e title, t h e n the f o l l o w i n g sequence is appropriate. U: What is the number for <GARBLE> J . Smith? S: Do y o u mean P r o f e s s o r Jim Smith or Professor Joe Smith? In the further the same w a y , if we assume that uncomprehended parts of a description can only r e s t r i c t the class of entities to which a description could refer, t h e r e is no p o i n t in system's trying to determine the uncomprehended part of a description whose c o m p r e h e n d e d part is already unique or unsatisfiable; for instance, there is no point in t r y i n g t o d e t e r m i n e the g a r b l e d part of "Professor <GARBLE> Smith" if only one P r o f e s s o r Smith is known to the system. Of course, there are exceptional cases (negatively phrased d e s c r i p t i o n s ) in w h i c h this assumption does not hold, but normally it will allow the s y s t e m t o i g n o r e t h e u n c o m p r e h e n d e d part without ill-effect (see Section 2.4). Now we turn interpretable; the to the uncertainty possible interpretations. on the case classification of in can which be a about description one is completely, interpretation or be but uncertainly, between several The action a system should take in such cases d e p e n d s v e r y the uncertain interpretations as either unique, much ambiguous, or 56 unsatisfiable. unsatisfiable, If o n e of then this the alternatives is unique, while the others interpretation should be raised in certainty are all a m b i g u o u s (on the g r o u n d s or that p e o p l e t r y to g i v e unique descriptions), possibly to the extent that the s y s t e m w o u l d w a n t t o a c c e p t this i n t e r p r e t a t i o n over ail the others (perhaps checking it w i t h an e c h o s e e 2.4). In g e n e r a l , b e i n g unique will tend to raise a description Section in c e r t a i n t y , w h i l e being a m b i g u o u s w i l l t e n d to l o w e r it, and being unsatisfiable will tend to l o w e r it e v e n m o r e . The p i c t u r e c h a n g e s , h o w e v e r , if an unsatisfiable alternative has near-misses, e s p e c i a l l y if it has just one near-miss. In such a case it might be fruitful to re-analyse the input to s e e if t h e a l t e r n a t i v e c o u l d be r e i n t e r p r e t e d as a direct description of the near-miss e n t i t y . If it c o u l d , t h e n t h e r e w o u l d be reasonable grounds for assuming that it was the i n t e r p r e t a t i o n the u s e r originally intended. The p r o c e s s of mapping descriptions onto entities must thus b e closely c o o r d i n a t e d w i t h the p r o c e s s of analysing the descriptions linguistically. 7.5. Summary of Identification from Descriptions W e c a n summarize the main points of this section as follows: - A f u n d a m e n t a l l y important important part of human conversational skills is a b i l i t y of a listener to identify entities from a speaker's d e s c r i p t i o n of them. the - In the g i v e n context, such descriptions may be uniquely s p e c i f y i n g , ambiguous, o r u n s a t i s f i a b l e b y any entity known to the listener. - In t h e ambiguous case, the speaker should be informed of the alternatives o r a s k e d f o r f u r t h e r distinguishing information, but there is no guarantee that he w i l l c h o o s e to make the distinction in the w a y the listener suggests. - In t h e unsatisfiable case, the listener should mention "near-misses", entities that almost fit the d e s c r i p t i o n . - If the d e s c r i p t i o n w a s only partially or uncertanly comprehended b y the l i s t e n e r , t h e s t r a t e g i e s he uses for ambiguous and unsatisfiable descriptions s h o u l d b e m o d i f i e d in a number of detailed ways. 8. Language Generation T h e e x t r a o r d i n a r y ability of humans to make sense out of the most c o n v o l u t e d and o b s c u r e l a n g u a g e f o r m s makes language generation one of the less critical components of interaction; even quite clumsy utterances by u n d e r s t o o d w i t h o u t great difficulty by its user. a gracefully interacting system Nevertheless, a user cannot be graceful could be presumed u p o n t o o much to compensate for a system's generative deficiencies; if a s y s t e m ' s u t t e r a n c e s a r e t o o c l u m s y , t h e y w i l l not appear natural; if they are over-detailed o r too long w i n d e d , t h e u s e r may b e c o m e f r u s t r a t e d ; in either case, the interaction will be less g r a c e f u l t h a n it c o u l d 57 be. In t h e r e m a i n d e r of this section, we discuss briefly some of the c h a r a c t e r i s t i c s of language that a gracefully including contextually several different echoes with interacting dependent system should imitate (anaphoric) references in o r d e r to appear and ellipsis, and the output. We will also discuss generation natural, inclusion messages (illocutionary acts) in the same utterance, e s p e c i a l l y other human considerations of combining specific g r a c e f u l l y i n t e r a c t i n g systems; in particular, the need to restrict the complexity of to generated u t t e r a n c e s in a c c o r d a n c e w i t h analytical ability. W h e n a human d e s c r i b e s some entity, he does not normally incorporate all that he Knows a b o u t t h e e n t i t y into the description; rather, as Grice [12] has pointed out, he n o r m a l l y u s e s s u f f i c i e n t i n f o r m a t i o n to identify the entity to his listener, but no more. T h u s , if a s p e a k e r w a n t e d t o d e s c r i b e a black desk, he might just say "the desk" if he thought that d e s c r i p t i o n w a s s u f f i c i e n t f o r his listener to understand what he meant. On the other hand, if he t h o u g h t t h e r e w a s some danger of confusion with a different desk (a b r o w n one, say), he might s a y "the black desk". T h e w a y an entity is described in human conversation, then, d e p e n d s not o n l y o n t h e e n t i t y and the speaker's knowledge of it, but also on the s p e a k e r ' s b e l i e f s a b o u t his listener's stale of knowledge and focus of attention. An appropriate description is t y p i c a l l y t h e least informative one that is sufficient to identify the d e s c r i b e d e n t i t y r e l a t i v e t o t h e l i s t e n e r ' s c u r r e n t focus of attention. research to generating descriptions Little attention has b e e n paid in natural appropriate in this sense, except for the language description g e n e r a t i o n s y s t e m of L e v i n and Goldman [26], The problem descriptions conventions, is, nevertheless, important for a gracefully interacting system; s u c h a s y s t e m uses to identify objects to its user do not c o n f o r m t o the user will find them unnatural and ungraceful. Thus, if a if the human restaurant r e s e r v a t i o n s s y s t e m w i s h e s to offer its user a reservation on Wednesday at 7pm, and if t h e u s e r has a l r e a d y made it clear that he wants a reservation on W e d n e s d a y in the e v e n i n g , t h e s y s t e m n e e d o n l y and should only describe the reservation by " 7 " (as in "Would 7 b e o k ? ) . w Using these conventions leads to the generation of anaphoric references. A similar c o n v e n t i o n (of not giving more information than one has to) can result in elliptical u t t e r a n c e s , e.g. r e p l y i n g "Jim Smith" instead of "I mean Jim Smith" in response to "Do y o u mean Jim Smith or Fred Smith?". It is clearly desirable for a gracefully interacting system to use k n o w l e d g e of the goal s t r u c t u r e and focus of attention of a c o n v e r s a t i o n to g e n e r a t e its such e l l i p s e s a n d a n a p h o r a as well as analyse them (see Section 6.3). Humans often use a single utterance to serve several different purposes (or following S e a r l e ' s t e r m i n o l o g y [36] to p e r f o r m several illocutionary acts). A p e r s o n might, f o r i n s t a n c e , 58 a s k if a w i n d o w w a s o p e n , both to indicate that he was cold and to ask his l i s t e n e r t o s h u t a n y o p e n w i n d o w , as well as to find out whether a window was indeed o p e n . A t a m u c h l e s s s o p h i s t i c a t e d level, an ability to combine more than one i l l o c u t i o n a r y act in a s i n g l e u t t e r a n c e is also useful for a gracefully interacting system. In particular, s u c h an a b i l i t y is u s e f u l for the kind of indirect echoing described in Section 2.4. a listener who is unsure of his interpretation of u n c e r t a i n i n t e r p r e t a t i o n into his reply. the interpretation is implicitly a speaker's In indirect e c h o i n g , utterance incorporates his If the original speaker does not comment o n t h e e c h o , confirmed. Thus, if a restaurant reservations system was u n s u r e a b o u t its r e c o g n i t i o n of " s e v e n " in: I'd like a r e s e r v a t i o n for s e v e n people, a suitable r e p l y would be: W h a t time w o u l d y o u r party of s e v e n like to eat? T h e s u b s t i t u t i o n of "your party of s e v e n " for the more normal " y o u " r e p r e s e n t s an i n d i r e c t e c h o of t h e s y s t e m ' s uncertain interpretation. Of course, using indirect e c h o e s c a n (and in t h i s e x a m p l e did) violate the convention discussed above of using descriptions o n l y minimally sufficient for identification. However, this appears to be acceptable p r e s u m a b l y b e c a u s e of the clarification purposes thus s e r v e d . in human dialogue, Similarly, indirect e c h o e s c a n a l s o r e p l a c e e l l i p s e s w i t h fuller forms, as in: U: I w a n t to go to P i t t s b u r g h on May 17. S: What time o n M a y 17 do y o u want to go to Pittsburgh? i n s t e a d of: S: W h a t time do y o u want to go? i n w h i c h t h e d a t e and destination are ellipsed. B e f o r e e n d i n g this section, w e must consider an aspect of generation that is much m o r e important for gracefully interacting systems than in general human conversation. As m e n t i o n e d in S e c t i o n 2.3, the form of a speaker's utterance tends to influence the f o r m of t h e listener's response. limited ability to Thus it is vitally important for a gracefully interacting s y s t e m w i t h a comprehend language g e n e r a t e s w i t h its r e c e p t i v e r e p e r t o i r e . to co-ordinate the questions and o t h e r This co-ordination has t w o p a r t s : output it first, a s y s t e m s h o u l d t r y to r e s t r i c t the content of the user's reply by asking questions that are as s p e c i f i c and directive as possible. To repeat an example from Section 2.3, if a system cannot u n d e r s t a n d " e x t e n s i o n " in "I would like the extension for Jim Smith", it s h o u l d ask "Do w a n t t h e number for Jim Smith?", rather than "What do y o u mean by 'extension'?". a system should be able to understand standard transformations p h r a s i n g s in w h i c h it asks its questions. you Secondly, and c o n v o l u t i o n s of the Thus, if a system asks "Would y o u p r e f e r 7 o ' c l o c k 59 or 8 o'clock?", it s h o u l d certainly be able to understand "I p r e f e r 7 o'clock.", and should p r o b a b l y also u n d e r s t a n d convolutions as distant as 7 o'clock w o u l d be my p r e f e r e n c e . " . H In s u m m a r y , although a human user can be expected to compensate somewhat for the d e f i c i e n c i e s of the output p r o d u c e d by a gracefully interacting system, the more n a t u r a l t h e output is, the more g r a c e f u l the system will appear to the user. should use possible. its model of the focus H o w e v e r , sometimes to generate other elliptical In particular, the and anaphoric demands, particularly those of system utterances indirect when echoing, can o v e r r i d e this r e q u i r e m e n t for terseness and cause the information omitted from an u t t e r a n c e b y e l l i p s i s and a n a p h o r a to be reintroduced. Finally, a gracefully interacting s y s t e m s h o u l d p r o d u c e o n l y o u t p u t that it can understand itself, in case it is transformed b y the u s e r and i n c o r p o r a t e d into his r e p l y . 9. Realizing Graceful Interaction H a v i n g o u t l i n e d the basic components of graceful interaction, w e n o w t u r n to the q u e s t i o n of how to fit the components together in a single gracefully interacting s y s t e m . f o l l o w i n g s u b s e c t i o n , w e give some design considerations for complete g r a c e f u l l y the interacting s y s t e m s , a n d go o n to outline an architecture in line with these considerations f o r i n t e r a c t i o n in simple s e r v i c e domains. In We are planning to implement a practical graceful gracefully i n t e r a c t i n g i n t e r f a c e according to the p r o p o s e d architecture, but at the time of w r i t i n g h a v e only just tentative s t a r t e d to do this. and subject The to revision. architecture proposed must t h e r e f o r e A second subsection follows be regarded a hypothetical i n t e r a c t i n g s y s t e m e m p l o y i n g the p r o p o s e d architecture through a realistic example as gracefully dialogue i n w h i c h many of the components of graceful interaction that w e have d i s c u s s e d c o m e i n t o play. 9.1. A r c h i t e c t u r e of a Gracefully Interacting System The overall important d e s i g n of characteristics any of gracefully the interacting interplay between system must the various take into account components of three graceful interaction. - A t a n y point in the t y p e of conversation w e are considering, the role of g r a c e f u l l y interacting system can be described in terms of just one of c o m p o n e n t s of g r a c e f u l interaction. the the - T h e s e v e r a l components of graceful interaction come into play in an i n h e r e n t l y a s y n c h r o n o u s manner, in the sense that a system's use of one component may b e i n t e r r u p t e d b y use of one of the others without the first coming to a proper conclusion. 60 - C o m p o n e n t s i n t e r r u p t e d in this way can often be restarted from w h e r e t h e y left o f f a f t e r the i n t e r r u p t i o n is over, but this is not inevitable. C o n s i d e r , f o r example, a dialogue concerned with identification of some entity b y the s y s t e m f r o m a u s e r ' s d e s c r i p t i o n . While this dialogue is taking place, the behaviour of the s y s t e m c a n b e e x p l a i n e d p u r e l y in terms of the identification from description component. time this line dialogue, or Section of conversation can be interrupted in favour of (say) a Y e t at channel-checking an e c h o c o r r e c t i o n , or e v e n an identification from a d i f f e r e n t d e s c r i p t i o n 6.4). any (see M o r e o v e r , the original identification may or may not be taken u p a f t e r the i n t e r r u p t i o n is o v e r . Because the different components of graceful interaction can come into play in this i n h e r e n t l y a s y n c h r o n o u s manner, we propose to organize our gracefully interacting s y s t e m as a s e t o f a u t o n o m o u s modules, each capable of carrying on a particular t y p e of d i a l o g u e : one f o r c h a n n e l c h e c k i n g , one for identification from descriptions, etc.. one At any time, e x a c t l y o f t h e s e m o d u l e s w o u l d be active, in the sense that its state w o u l d hold the s y s t e m ' s p r i m a r y v i e w of w h a t w a s h a p p e n i n g in the conversation. However, the other modules w o u l d b e k e p t i n a s t a t e of r e a d i n e s s , and if the user's next input could be dealt w i t h b e t t e r b y o n e of t h e o t h e r m o d u l e s , it w o u l d be made the currently active module. The state of a module u s u r p e d in t h i s w a y w o u l d , of c o u r s e , be p r e s e r v e d in anticipation of a possible c o n t i n u a t i o n . module w o u l d indicate w h i c h inputs it was able to deal with b y conditions; the inputs it can deal with are those that satisfy Each a set of e x p e c t a t i o n s the expectations. e x p e c t a t i o n s c o u l d change according to what had gone before; thus, a d e s c r i p t i o n or These identifier m o d u l e w o u l d have d i f f e r e n t expectations after it had presented a list of alternative f i l l e r s o f an ambiguous d e s c r i p t i o n , than after it had informed the user that a description was u n s a t i s f i a b l e ; again, a module for detecting echo corrections w o u l d o n l y have an e x p e c t a t i o n at all a f t e r the s y s t e m had issued an echo. W h a t modules are n e e d e d , and what are their functions? We give h e r e our c u r r e n t answer f o r s i m p l e s e r v i c e systems; w e expect that most of the modules w o u l d stay the same f o r g r a c e f u l l y i n t e r a c t i n g s y s t e m of greater abilities. - Channel maintainor: This module can conduct a channel-checking dialogue. It w i l l i n i t i a t e o n e if the user fails to reply to the system, and will r e s p o n d to the u s e r ' s i n i t i a t i o n of s u c h a dialogue. - U s e r ' s echo monitor: This module can conduct a dialogue to c o r r e c t i n c o r r e c t e c h o e s b y the user of what the system said. It will detect any e c h o i n g b y the u s e r and initiate a c o r r e c t i o n dialogue if the echo is incorrect. - Echo c o r r e c t i o n monitor: This module can detect any c o r r e c t i o n b y the user of e c h o e s b y the system. It will take part in a dialogue to e n s u r e that the a 61 c o r r e c t i o n is u n d e r s t o o d p r o p e r l y . - Explicit incomprehension monitor: This module can engage in a dialogue to c l e a r u p explicit complaints of complete or partial misunderstanding b y the user. It w i l l initiate its dialogue w h e n the user makes such an explicit complaint. - Incomprehension resolver: This module is used w h e n e v e r the d e g r e e of c o m p r e h e n s i o n of the user's input is unsatisfactory. According to the d e g r e e of i n c o m p r e h e n s i o n and the importance of what was not understood, it can e c h o o r r e q u e s t c l a r i f i c a t i o n explicitly, and participate in any clarifying dialogue. (The e c h o c o r r e c t i o n monitor could be included as part of this module, since it d e a l s w i t h a logical continuation of echoing by the system). - D e s c r i p t i o n identifier: This module is used to identify entities from d e s c r i p t i o n s b y t h e user. If n e c e s s a r y , it can request descriptions explicitly, and e n g a g e in d i a l o g u e to c l a r i f y ambiguous or unsatisfiable descriptions. - A b i l i t y and goal describer: This module can engage in a dialogue about the s y s t e m ' s ability, either in response to direct questions, or as a last r e s o r t w h e n i n c o m p r e h e n s i o n resolution does not appear to be working. It can also d e s c r i b e t o the u s e r what it thinks the user is trying to do, how it is c o o p e r a t i n g w i t h that, a n d w h a t the user needs to do to let it succeed; this information is available t h r o u g h the focus of the system. - Initiator: T h i s module is used to establish communication b e t w e e n the s y s t e m a n d the user. It is always the first module activated. Besides e n s u r i n g that the c o m m u n i c a t i o n channel is indeed open, it can conduct a dialogue about what t h e s y s t e m is and can do; this allows the user to make certain that the s y s t e m is the p a r t y he r e a l l y wants. - Terminator* This module is used w h e n the conversation needs to be t e r m i n a t e d . It c a n initiate a termination w h e n the task has been completed, and can also b e a c t i v a t e d w h e n the user decides to give up before finishing the task. To coordinate all these modules, some kind of executive is needed. The job of e x e c u t i v e is to direct the user's input to the appropriate module, and to make s u r e that one m o d u l e c o m p l e t e s its dialogue, control is turned over to the next a p p r o p r i a t e F o r t h e s e r e a s o n s w e call this executive module the focus maintainer. we envisage is representation more than one tailored directly to simple service as d i s c u s s e d in Section 4.3. slot by The focus systems, and assumes a module. maintainer It can handle input containing d e s c r i p t i o n s parcelling the descriptions out to fill; it c a n detect when frame-based to appropriate invocations d e s c r i p t i o n i d e n t i f i e r module; it keeps track of which slot the user and s y s t e m are trying this of of the currently any correction by the user of slots that have p r e v i o u s l y f i l l e d ; it c a n initiate filling of a slot w h e n the user has not tried to fill it; it c a n d e a l been with d e s c r i p t i o n s of s e v e r a l slots at once; it can decide when all slots have b e e n filled; and it c a n k e e p t r a c k of the focus within a slot, i.e. which aspect of the entity d e s c r i b e d is b e i n g u s e d t o r e s o l v e ambiguities. 62 P a r s i n g of e a c h input will be done in accordance with the demands of flexible parsing d i s c u s s e d in S e c t i o n 3.2; in particular, the parsing will be bottom-up, not s t r i c t l y d i r e c t i o n a l , a n d b a s e d o n pattern-matching. Each module will parse its input s e p a r a t e l y (if this r e s u l t s in t o o m u c h d u p l i c a t i o n of e f f o r t , partial results could be saved). This allows the e x p e c t a t i o n s o f e a c h i n d i v i d u a l module to influence the parsing in two advantageous w a y s : time the parser secondly, will only use patterns relevant to the module for w h i c h it is p a r s i n g ; a n d expectations of specific replies can be used to recognize u t t e r a n c e s in themselves, as discussed in Sections 3.2 and 6.3. from the user first, at a n y g i v e n that satisfies an active module's expectations ellipses as complete Using these s t r a t e g i e s , i n p u t can be recognized with a minimum of s e a r c h . H o w s h o u l d the s e v e r a l modules listed above be implemented? e a c h of Hearsay them as a finite state network. system [19], a finite state One p o s s i b i l i t y is t o m o d e l As was shown in the discourse c o m p o n e n t of network is very suitable s t r a i g h t f o r w a r d c o n v e r s a t i o n in which nothing unexpected happens. for representing the a The preceding dialogue c a n b e r e p r e s e n t e d implicitly b y the state of a finite network plus p o s s i b l y the c o n t e n t s of s o m e r e g i s t e r s , and transition to other states can be made conditional on b o t h the input a n d t h e c o n t e n t s of r e g i s t e r s , thus ensuring that the state reached b y a transition r e f l e c t s all t h e r e l e v a n t h i s t o r y of the conversation, and enabling the system's r e s p o n s e to a u s e r ' s input t o d e p e n d b o t h o n the input, and on what has gone before in the dialogue. These f e a t u r e s appear to make the finite state model extremely suitable for the m o d u l e s d e s c r i b e d above, since each of them can, by definition, conduct c o n v e r s a t i o n in w h i c h nothing unexpected happens. several a straightforward Figure 1 shows a finite state net f o r a s i m p l i f i e d d e s c r i p t i o n identifier module; the diamond boxes are tests and the s q u a r e b o x e s a r e a c t i o n s , w h i l e the circles are states; transitions are made from state to state a c c o r d i n g t o t h e c o n d i t i o n s , w h i c h t y p i c a l l y d e p e n d on the input, but in some cases also d e p e n d o n (e.g. t h e test for one o p t i o n specified, after the ambiguous description state). registers It is e a s y t o s e e f r o m t h e example that the expectations of a given module in a given state c o r r e s p o n d t o t h e c o n d i t i o n s o n the arcs leading from that state. Nevertheless, disadvantages. the finite These state model for the individual modules suffers from certain arise mainly because the implicit w a y in w h i c h state i n f o r m a t i o n r e p r e s e n t e d in a finite state network. is In such a network it is natural to e n c o d e c u r r e n t g o a l s a n d p a s t h i s t o r y implicitly in the state nodes. In Figure 1, for example, the state " a m b i g u o u s d e s c r i p t i o n " implicitly e n c o d e s the fact that the user has given the s y s t e m an ambiguous d e s c r i p t i o n , and the goal of the system to get the user to make that d e s c r i p t i o n u n a m b i g u o u s . T h i s i m p l i c i t n e s s makes finite state networks relatively hard to modify, since the s t a t e implicit 63 Figure h Finite-state net for description identifier module in a n o d e must in g e n e r a l be taken into consideration w h e n links to or from that n o d e added or deleted. M o r e serious perhaps is the problem that finite state modules w o u l d not b e p e r s p i c u o u s to other modules, and in particular to the focus maintainer. We believe that c o m m u n i c a t i o n b e t w e e n modules about the goals that they are c u r r e n t l y p u r s u i n g may very i m p o r t a n t to a complete system. It may be best to tackle the p r o b l e m of b y r e p r e s e n t i n g e a c h module as an independent set of production rules. possible to important. are compile such rule sets into finite state networks if e x t r a modifiability It s h o u l d still efficiency Communication b e t w e e n modules might be achieved in the context of be be proved independent r u l e s e t s t h r o u g h s h a r e d global variables. On the other hand, it may b e n e c e s s a r y t o maintain 64 a s t a c k of s y s t e m and user goals In a w a y similar to the "dialogue games" model of L e v i n a n d Moore [25]. We expect to resolve these questions and many other more d e t a i l e d ones t h r o u g h e x p e r i e n c e w i t h the implementation that w e are currently undertaking along t h e lines presented above of a gracefully interacting system for composing, s e n d i n g , f i l i n g , and r e t r e i v i n g e l e c t r o n i c mail. 9.2. W o r k e d Example Below we list a realistic example of the sort of dialogue that w e expect a gracefully i n t e r a c t i n g s y s t e m to be able to participate in; The example was c o n s t r u c t e d to i l l u s t r a t e most o f t h e c o m p o n e n t s of interaction that w e have discussed. s i m p l e s e r v i c e s y s t e m for restaurant reservations. The system in the e x a m p l e is a We will follow t h r o u g h the d i a l o g u e and s e e h o w the s e v e r a l modules d e s c r i b e d above contribute to the system's p a r t i c i p a t i o n . S: U: S: U: S: Hello! T h i s is the B r o w n C o w restaurant. H e l l o ! T h e B r o w n C o w restaurant? Y e s ! W o u l d y o u like to make a reservation? Y e s ! I'd like one for eight tonight. A t eight o'clock, and for how many people? 5 U: T h e r e will b e (five) of us. <clicks o n line> U: Hello! Hello! A r e y o u there? S: Y e s ! C a n y o u hear me? U: Y e s I c a n . S: OK. F o r t h r e e p e o p l e , and what name will that be under? U: N o ! I s a i d f i v e . S: O h ! F i v e p e o p l e . I can't give y o u eight o'clock; it would have to be s e v e n o r U: S: U: S: U: S: U: S: 10 a f t e r nine thirty. Y o u don't have anything b e t w e e n those times? N o ! I'm afraid w e don't. W e l l , w e ' l l wait until the later time. Was it nine or nine thirty? It w a s nine thirty. What name will that be under? Teacon. W a s that B e a c o n w i t h a 'B', or Deacon with a 'D'? It's T e a c o n w i t h a 'T'; T E A C O N . T E A C 0 N. V e r y g o o d ! We'll expect five of y o u at nine thirty. 15 20 U: OK. T h a n k y o u ! S: Y o u ' r e w e l c o m e ! B y e ! U: B y e ! The first t h r e e lines are dealt w i t h by the initiator module. A c o n v e r s a t i o n is always s t a r t e d b y the activation of this module, and its first action is to give a g r e e t i n g w h i c h b o t h i n f o r m s t h e u s e r that the channel of communication is o p e n , and tells him w h o he is t a l k i n g t o . O n line 2, t h e u s e r asks for some reassurance that he is talking to the right p a r t y ; the m o d u l e p r o v i d e s it, d e - a c t i v a t e s itself, and passes control back to the focus maintainer; S i n c e focus maintainer has no input to w o r k on, it asks the user if he wants to u s e t h e the (only) 65 s e r v i c e that the s y s t e m can provide. If, at this point, the user had still not been satisfied with whom it was i n i t i a t o r module w o u l d have b e e n re-activated. taking to, t h e Instead, the user satisfies the e x p e c t a t i o n of t h e f o c u s maintainer b y s p e c i f y i n g a reservation. We are assuming a f r a m e - b a s e d s y s t e m , s o t h e f o c u s maintainer must now activate the description identifier module for e a c h slot c o v e r e d b y the specification. Since the two slot specifications are uniquely identifying, the i d e n t i f i e r m o d u l e t e r m i n a t e s in b o t h cases without generating any output. However, the p r i o r c h o i c e o f w h i c h s l o t s to use w a s not straightforward: "tonight" specifies the date slot, but "for e i g h t " c o u l d s p e c i f y e i t h e r the p a r t y size or the time. The focus maintainer has t w o o p t i o n s : either a s k t h e u s e r e x p l i c i t l y w h i c h one he meant (using the incomprehension r e s o l v e r module), o r c h o o s e o n e of t h e o p t i o n s o n its o w n initiative. Since eight is a sufficiently large p a r t y s i z e , a n d a s u f f i c i e n t l y p o p u l a r r e s e r v a t i o n time to make the time i n t e r p r e t a t i o n s i g n i f i c a n t l y m o r e p l a u s i b l e , it makes its o w n choice. However, an assumption like this cannot b e made w i t h o u t i n f o r m i n g t h e u s e r , s o the assumption is included in an echo at the start of line f i v e . Making t h e e c h o w i l l set u p an e x p e c t a t i o n by the echo correction monitor that the user w i l l c o r r e c t t h e a s s u m p t i o n ; h o w e v e r , in this case the system struck lucky; the e c h o is not c h a l l e n g e d , a n d t h e a s s u m p t i o n is t h e r e b y implicitly confirmed. After e c h o i n g , the focus maintainer selects an empty slot, the p a r t y s i z e , and s t a r t s d e s c r i p t i o n i d e n t i f i e r for that slot, which in turn asks for the p a r t y s i z e . conforms to the unambiguously expectations a party of the now active description The user's identifier by the reply specifying size of what the system hears as three, but w h i c h r e a l l y is five. H o w e v e r , the s y s t e m is sufficiently unsure of its perception of the number to mark it d o w n f o r an e c h o . At this checking point, an dialogue. unexpected sound on the line causes This satisfies input the ever-present the user to initiate expectations of a the channel channel m a i n t a i n e r , s o that it becomes active at this point and conducts the dialogue up until t h e OK of line 11. T h e conflict b e t w e e n the satisfaction of the expectations of b o t h t h e channel m a i n t a i n e r , and the d e s c r i p t i o n identifier is resolved by a p r e c e d e n c e o r d e r i n g o n the s e v e r a l modules; the pressing precedence concerns is analogous to an interrupt priority scheme in w h i c h the r e c e i v e the highest precedence, so that the channel maintainer has most the h i g h e s t p r e c e d e n c e , f o l l o w e d b y the echo modules, with the description identifier h a v i n g t h e lowest precedence. N o w that t h e channel has b e e n re-established, the description identifier can c o n t i n u e from w h e r e it left off, s o it e c h o e s the party-size, and then de-activates itself, allowing the f o c u s m a i n t a i n e r t o take o v e r . T h e focus maintainer then selects the only slot still to b e f i l l e d , a n d 66 starts up the description identifier on that slot, resulting in the request for H o w e v e r , the user has d e t e c t e d the incorrect echo and issued a c o r r e c t i o n . the name. This does not s a t i s f y the e x p e c t a t i o n s of the active description identifier, but does satisfy those of t h e e c h o c o r r e c t i o n monitor. T h e monitor corrects the description, and, after r e - e c h o i n g to e n s u r e that t h e m e s s a g e has b e e n r e c e i v e d correctly this time, terminates, allowing the f o c u s maintainer t o r e s t a r t the p r e v i o u s invocation of the description identifier with the n e w d e s c r i p t i o n . U n f o r t u n a t e l y , the d e s c r i p t i o n is no longer satisfiable, since no table for a v a i l a b l e at eight o'clock. five people In this situation, the normal response of the d e s c r i p t i o n is identifier w o u l d b e to look for near-misses for the party-size, but it w o u l d clearly b e a mistake o f f e r t h e u s e r a table for three or four at eight o'clock. The problem arises b e c a u s e of t h e i n t e r a c t i o n b e t w e e n the p a r t y - s i z e and the time in determining w h e t h e r satisfiable. to the d e s c r i p t i o n is T h e s y s t e m , t h e r e f o r e , provides a method of s p e c i f y i n g w h i c h slots o f a f r a m e c o n s t r a i n w h i c h o t h e r s , and o f f e r s a mechanism for specifying which to change to r e s o l v e a n y c o n f l i c t s that a r i s e . In this case, the time, date, and p a r t y - s i z e all interact to c o n s t r a i n e a c h other, and changes are to be attempted in that order. The net e f f e c t is that i n s t e a d i n d i c a t i n g that f i v e is an unsatisfiable description of a p a r t y - s i z e , the d e s c r i p t i o n of identifier f o r p a r t y - s i z e a c c e p t s five as a unique description, terminates, and r e s t a r t s the d e s c r i p t i o n i d e n t i f i e r f o r time w i t h the previously given description. T h i s time the normal p r o c e d u r e for dealing with unsatisfiable descriptions a p p l i e s , a n d t h e s y s t e m lists its near misses. t h e d e s c r i p t i o n identifier straightforward The user's reply in line 14 satisfies o n e of the e x p e c t a t i o n s o f b y essentially repeating the same specifications. affirmation that the limitations given are truly c o r r e c t . T h e r e p l y is The tendency for h u m a n s t o ask f o r confirmation w h e n they hear something that they don't w a n t t o h e a r very c o m m o n , and t h e r e should perhaps be a larger recognition of that in a a is gracefully i n t e r a c t i n g s y s t e m , but w e have not investigated that possibility. T h e u s e r next accepts the later of the two times, but asks the system to c o n f i r m e x a c t l y w h i c h time it is. T h e first part of this input terminates the description identifier f o r t h e time, a n d t h e s e c o n d part is handled b y the explicit incomprehension monitor, w h i c h g e n e r a t e s t h e first part of line 17. The second part is generated by a re-initialized version d e s c r i p t i o n i d e n t i f i e r for the name in which the reservation is to be made. of the T h e module is r e - i n i t i a l i z e d b e c a u s e a module cannot be continued from w h e r e it left off if o t h e r m o d u l e s at the same priority level have participated in the intervening dialogue. The ensuing d e t e r m i n a t i o n of the name shows some of the special problems that can arise w h e n n a m e s not p r e v i o u s l y k n o w n to the system have to be communicated. descriptions properly, r e c o g n i z e spellings. a system needs a catalogue of common names proper To understand and an such ability to 67 Finally points. o n line 2 1 , the r e s e r v a t i o n is complete and the system reconfirms A g a i n this is an echo which the user could contradict. the essential In the e v e n t , he is s a t i s f i e d a n d t h e c o n v e r s a t i o n is finished off by the terminator module. Conclusion G r a c e f u l i n t e r a c t i o n is of fundamental importance for all computer computer s y s t e m s that interact w i t h humans. Without graceful interaction skills, interactive computer s y s t e m s c o n t i n u e to a p p e a r u n c o o p e r a t i v e , uncompromising, and altogether obtuse to the user. will non-expert Y e t , as w e have s e e n , graceful interaction is a little studied field; some w o r k has b e e n d o n e o n s o m e of the skills r e q u i r e d , but such efforts have generally b e e n tangential t o t h e m a i n t h r u s t of the systems in which they w e r e embedded. No w o r k e r s have a t t e m p t e d t o s t u d y t h e g r a c e f u l interaction problem as a whole. In t h i s p a p e r w e have attempted to circumscribe graceful interaction as a f i e l d f o r To this e n d , w e robust h a v e d e f i n e d a set of basic components of graceful i n t e r a c t i o n , study. including c o m m u n i c a t i o n , flexible parsing, explanation facilities, focus mechanisms, i d e n t i f i c a t i o n f r o m d e s c r i p t i o n s , and g e n e r a t i o n of descriptions. These components are c e r t a i n l y necessary f o r a n y k i n d of g r a c e f u l interaction, but are also sufficient, w e claimed, for s y s t e m s r e s t r i c t e d t o o f f e r i n g o n e of the v e r y large class of simple services. F i n a l l y , w e b e l i e v e that graceful interaction is an idea whose time has come, and is r i p e f o r implementation. W e claim that none of the components w e d e s c r i b e d is significantly beyond t h e c u r r e n t s t a t e of the art in dialogue processing (at least not for simple s e r v i c e s y s t e m s ) , and we have presented an g r a c e f u l l y i n t e r a c t i n g system. architecture for their integrated implementation in a single A gracefully interacting simple service s y s t e m (for s e n d i n g a n d r e c e i v i n g e l e c t r o n i c mail) conforming to this architecture is c u r r e n t l y under implementation. Acknowledgements We are particularly grateful to Jaime Carbonell and Candy Sidner for their insightful c o m m e n t s o n e a r l i e r v e r s i o n s of this paper. References 1. A u s t i n , J . L. How to Do Things with Words. Oxford University P r e s s , 1962. 2. B o b r o w , D. G., Kaplan, R. M., Kay, M., Norman D, A., Thompson, H., and W i n o g r a d , T . F r a m e - D r i v e n Dialogue System. Artificial Intelligence 8 (1977), 1 5 5 - 1 7 3 . GUS: a 63 3. C a r b o n e l l , J . G. Subjective Understanding: Computer Models of Belief Systems. Ph.D. T h . , Y a l e University, 1979. 4. C a r b o n e l l , J . R. Mixed-Initiative Man-Computer Dialogues, tech r e p o r t 1970, Bolt, B e r a n e k , and N e w m a n , Inc., 1971. 5. C h a r n i a k , E. C. T o w a r d a Model of Children's Story Comprehension. TR-266, MIT AI Lab, C a m b r i d g e , Mass., 1972. 6. C o d d , E. F. S e v e n Steps to RENDEZVOUS with the Casual User. In Klimbie, J . W. and K o f f e m a n , K. L., Ed., Proc. IF1P TC-2 Working Conf on Data Base Management Systems, N o r t h H o l l a n d , A m s t e r d a m , 1974, p p . 179-200, 7. C o h e n , P. R. On Knowing What to Say: Planning Speech Acts. Ph.D. Th., Dept. of C o m p u t e r S c i e n c e , U n i v e r s i t y of T o r o n t o , January 1978. also available as T e c h . Report 1 1 8 8 . C o l b y , K. M. Simulations of Belief Systems. In Schank, R. C. and C o l b y , K. M., Ed., Computer Models of Thought and Language, Freeman, San Francisco, 1973, p p . 2 5 1 - 2 8 6 . 9. D o y l e , J . T r u t h Maintenance Systems for Problem Solving. T R - 4 1 9 , MIT A I Lab., C a m b r i d g e , Mass., 1978. 10. F a h l m a n , S. E. NETL: A System for Representing and Using Real-World Knowledge. M I T P r e s s , C a m b r i d g e , Mass., 1979. 11. F o x , M. S., and M o s t o w , D. J . Maximal Consistent Interpretations of E r r o r f u l Data in H i e r a r c h i c a l l y M o d e l l e d Domains. Proc. Fifth Int. Jt. Conf. on Artificial Intelligence, M I T , 1 9 7 7 , pp. 165-171. 12. G r i c e , H. P. Semantics: 13. Logic and Conversation. In Cole, P. and Morgan, J . L., Ed., Syntax and Speech Acts, Academic Press, New York, 1975. G r i m e s , J . E. T o p i c Levels. Proc. of Theoretical Issues in Natural Language P r o c e s s i n g : A n I n t e r d i s c i p l i n a r y W o r k s h o p , University of Illinois at Urbana-Champaign, July, 1 9 7 8 , p p . 104-108. 14. G r o s z , B. J . T h e R e p r e s e n t a t i o n and Use of Focus in a System for U n d e r s t a n d i n g D i a l o g u e s . P r o c . F i f t h Int. Jt. Conf. on Artificial Intelligence, MIT, 1977, p p . 6 7 - 7 6 . 15. G r o s z , B. J . F o c u s i n g in Dialogue. Proc. of Theoretical Issues in Natural L a n g u a g e P r o c e s s i n g : A n Interdisciplinary Workshop, University of Illinois at U r b a n a - C h a m p a i g n , July, 1978. 16. H a y e s , P. J . On Semantic Nets, Frames, and Associations. Proc. Fifth Int. Jt. Conf. o n A r t i f i c i a l Intelligence, MIT, 1977, p p . 9 9 - 1 0 7 . 17. H a y e s , P. J . A R e p r e s e n t a t i o n for Robot Plans. Proc. Fourth Int. Jt. Conf. o n A r t i f i c i a l I n t e l l i g e n c e , T b i l i s i , G e o r g i a , 1975, pp. 181-188. 18. H a y e s - R o t h , F., Erman, L. D., Fox, M., and Mostow, D. J . Syntactic P r o c e s s i n g in HEARSAY-II. S p e e c h Understanding Systems. Summary of Results of the F i v e - Y e a r E f f o r t at C a r n e g i e - M e l l o n University, Carnegie-Mellon University Computer S c i e n c e Department, 1976. Research 69 1 9 . H a y e s - R o t h , F., Gill, G., and M o s t o w , D. J. Discourse and Task P e r f o r m a n c e in H E A R S A Y - I I . S p e e c h U n d e r s t a n d i n g Systems. Summary of Results of the F i v e - Y e a r R e s e a r c h E f f o r t at C a r n e g i e - M e l l o n U n i v e r s i t y , C a r n e g i e - M e l l o n University Computer Science Department, 1 9 7 6 . 2 0 . H e n d r i x , G. G. Human Engineering for A p p l i e d Natural Language P r o c e s s i n g Int. J t . C o n f . o n A r t i f i c i a l Intelligence, MIT, 1977, pp. 1 8 3 - 1 9 1 . / F 2 1 . H e n d r i x , G. G. E x p a n d i n g the.Utility of Semantic N e t w o r k s t h r o u g h Par â„¢ " ' F o u r t h Int. Jt. C o n f . o n Artificial Intelligence, Tbilisi, Georgia, 1975, p p . lir g 22. H o c k e t t , C. F. 23. K a p l a n , S. J . Cooperative System. A Course in Modern Linguistics. MacMillan, New Yor f t h °°' ' t 9 5 9 Responses from a Portable Natural langu' , £ D Ph.D. Th., Dept. of Computer and Information Science, U n i v e r s i f P r , ta P Ba5 e n n s . C y Q , v a n u B , a r y ' Philadelphia, 1979. 24. L e h n e r t , W. The Process of Question Angering. 25. L e v i n , J.*A., a n d M o o r e , J . A. L a n g u a g e Understanding. 26. l?»>- >~ ~ ' '> Hillside, N. 1, 1 9 7 8 . Dialfue Games: Meta-Communtation S t r u c t u r e s for N a t u r a l Cognitive uence I, 4 (1977), 3 9 5 - 4 2 0 . L e v i n , J . A. a n d Goldman N. M P r o c e s s Models of r,L/ere?7;e in Context. 1SI/RR-78-72, I n f o r m a t i o n S c i e n c e s institute, U^. of Southern California, October, 1978. 27. M a n n , W. C , M o o r e d A., and Levin, J . A. A Comprehension M o d e l for Human D i a l o g u e . P r o c . F i f t h Int. Jt. Conf. 28. M c K e o w n , K. System. 29. Artificial Intelligence, MIT, 1977, pp. 7 7 - 8 7 . P a r a p l r a s i n g Using Given and New Information in a Q u e s t i o n - A n s w e r i n g A s s o c i a t i o n forComputationai Linguistics Meeting, San Diego, 1 9 7 9 . M i n s k y , M. Psychology 30. n A Framework for Representing Knowledge. In W i n s t o n , P., Ed., The of Computer Vision, M c G r a w Hill, 1975, pp. 2 1 1 - 2 7 7 . P a r k i s o n , R. C , C o l t y . K. M., and Faught, W. S. Conversational Language C o m p r e h e n s i o n U s i n g I n t e g r a t e d P a t t e r n - M a t c h i n g and Parsing. Artificial Intelligence 9 (1977), 1 1 1 - 1 3 4 . 31. R a p h a e l , B. L., E d . , Semantic SIR: A Computer Program for Semantic Information R e t r i e v a l . Information In M i n s k y , M . Processing, MIT P r e s s , Cambridge, Mass., 1968, p p . 3 3 - 1 3 4 . 3 2 . R i e s b e c k , C. K. C o m p u t a t i o n ^ Understanding: Analysis of S e n t e n c e s and C o n t e x t . W o r k i n g P a p e r 4, Istituto p e r gli Studi Semantici e Cognitivi, Castagnola, S w i t z e r l a n d , 1 9 7 4 . 33. S a c e r d o t i , E. D. L a n g u a g e Access , f 0 Distributed Data w i t h E r r o r R e c o v e r y . Proc. Fifth Int. J t . C o n f . o n / ' t i f i c i a l Intelligence, tv^T, 1977, pp. 1 9 6 - 2 0 2 . 34. S a c e r d o t i , E. t 2 U97' 3 5 - 197 3 6 - 4 ) i Planiyjn* <r l . ^ n r a r r h y o f Abstraction Spaces. Artificial Intelligence S, 115-l^E. ' S c r a g g , G. W . Answering Questions about Processes. Ph.D. Th., U. of C a l i f o r n i a , S a n D i e g o , 4 # S e a r l e , J . R. Speech Acts. Cambridge University P r e s s , 1969. 70 3 7 . S i d n e r , C, A P r o g r e s s Report o n the Discourse and Reference Components of P A L . M e m o . 4 6 8 , MIT A. I. Lab., 1978. A. I. 3 8 . S u s s m a n , G. J . and McDermott, D. V. From PLANNER to CONNIVER - a Genetic A p p r o a c h . P r o c . Fall Joint C o m p u t e r Conf., AFIPS Press, Montvale, N.J., 1972, pp. 1 1 7 1 - 1 1 7 9 . 3 9 . W a l k e r , D. S p e e c h Understanding Research, Final Report, Project 4 7 6 2 . I n t e l l i g e n c e C e n t e r , S t a n f o r d Research Institute, Menlo Park, Ca., 1976. Artificial 4 0 . W a l t z , D. L. A n English Language Question A n s w e r i n g System for a L a r g e Relational Data B a s e . Comm. ACM 21, 7 (1978), 5 2 6 - 5 3 9 . 4 1 . W e i s c h e d e l , R. M. and Black, J . Responding to Potentially Unparseable S e n t e n c e s , t e c h . r e p o r t 7 9 / 3 , Dept. of Computer and Information Sciences, University of D e l a w a r e , 1 9 7 9 . 4 2 . W e i z e n b a u m , J . ELIZA - A Computer Program for the Study of Natural Language C o m m u n i c a t i o n b e t w e e n M a n and Machine. Comm. ACM 9, 1 (January 1966), 3 6 - 4 5 . 4 3 . W i l k s , Y. A. P r e f e r e n c e Semantics. In Keenan, Ed., Formal Semantics of L a n g u a g e , C a m b r i d g e University Press, 1975. 44. W i n o g r a d , T. 45. W o o d s , W. A. Understanding Natural Natural Language. Academic P r e s s , New Y o r k , 1 9 7 2 . T r a n s i t i o n Network Grammars for Natural Language A n a l y s i s . Comm. ACM 1 3 , 1 0 ( O c t o b e r 1970), 5 9 1 - 6 0 6 . 46. W o o d s , W. A., Kaplan, R. M., and Nash-Webber, B. The Lunar Sciences Language S y s t e m : Final Report. 2 3 7 8 , Bolt, Beranek, and Newman, Inc., 1972. 4 7 . W o o d s , W. A., Bates, M., B r o w n , G., Bruce, B., Cook, C , Klovstad, J., Makhoul, J., N a s h - W e b b e r , B., S c h w a r t z , R., Wolf, J., and Zue, V. Speech Understanding Systems - F i n a l T e c h n i c a l R e p o r t . 3 4 3 8 , Bolt, Beranek, and Newman, Inc., 1976.
© Copyright 2026 Paperzz